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SOME CONSEQUENCES OF DEFECTIVE INTELLIGENCE THEORY 


Psychologists dealing with the application of intelligence tests 
em to pass through alternating phases of uncritical overconfidence 
nd cynical despair with regard to the validity of their nreasurements. 
> judge by recent utterances the fashionable phase at the moment is 
sillusionment; the tests do not measure any constant characteristic 
the individual, and no two tests measure the same thing.' 

The paralyzing effect of such antics upon steady investigation and 
ynstructive theory is most apparent in social psychology, where some 
-reaching causative relations are just becoming apparent between 
he dynamics of population, the dynamics of socio-political ideas, 
nd the static resistances arising from the distribution of intelligence 
otients.2* Justifiable limitations to the interpretation of tests are 
dicated by Neff when he says, ‘‘ Most authorities are now agreed 
xat a test standardized on one racial or national group cannot be 
pplied to a group of differing culture and background”’; but he joins 
bsurdly in the current panic stampede from a sense of perspective 
then he concludes, ‘‘ All of the twenty point mean differences in IQ 


1 Most of the current statements about IQ’s are really statements about special 
vironment skills, functional fluctuation, experimental error, etc. in unassigned 
grees, as may be represented by the following equation. 


IQ (apparent) = IQ (real) +8+f +e+>p, 


here s is a large special factor of knowledge or skill. 
f is the functional fluctuation of the individual’s intelligence, diurnally etc. 
e is experimental error of measurement. 
p is a factor of intelligence test sophistication. 
* Lorimer, F. and Osborn, F.: Dynamics of Population. 1934. 
* Cattell, Raymond B.: ‘‘Some Changes in Social Life in a Community with a 
lling Intelligence Quotient.” The British Journal of Psychology, Vol. xxv, 
38, pp. 430-450. 
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found to exist between children of the lowest and highest social status 
may be accounted for entirely in environmental terms.”’! 

That such capricious doubts can be thrown on the whole of the 
closely dovetailed superstructure of educational and social research 
data and theory is possible only because of years of neglect in regard 
to the real foundations of intelligence testing. It represents the 
cost of precipitate, incontinent, and complacent multiplication of 
intelligence tests without sound research and theory concerning the 
nature of intelligence. True, in the last eight years there has been a 
more widespread tendency for research workers to examine their tests 
more carefully in the light of general principles; but the crop of results 
in educational and social research available today springs from tests 
designed before this period. Consequently discussion gets nowhere; 
and it is logically possible for Neff,? for example, to argue that there 
are no social status differences in innate intelligence, or for Klineberg*® 
to argue that even the most biologically distant racial groups do not 
differ in average native ability, in face of the general sense of all the 
direct and indirect evidence to the contrary. 

In the unfortunate medley of tests employed we find only this 
much in common: that they measure a good deal of obviously acquired 
knowledge and skill, and that they are heavily weighted with scoring 
on special abilities distinct from intelligence. The meaning of the 
measurements used in these researches will probably remain forever 
obscure. There has been much painstaking work on the theme around 
which most of the applied problems cluster; namely, the nature- 
nurture issue, but their expenditure is rendered null and void in most 
instances because the experimenters argue in a circle, first putting 
environmental skills in their tests and then proving that environment 
effects ‘“‘intelligence’”—obtaining various results according to the 
amount of contamination of the instrument. One might as well wipe 
the slate clean of these earlier results—and especially those at the 
nursery school age—and begin afresh with sounder tests. 


A COMMON SOURCE OF ERROR 


Instead of bringing a charge seemingly at random it would be best 
to pillory one of the leading offenders, the Binet test, on which 9 





1 Neff, W. S.: “Socio-economic Status and Intelligence: a Critical Survey.” 
Psychological Bulletin, Vol. xxxv, No. 10, 1938. 

2 Ibid. 

3 Klineberg, O.: Race Differences. 
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surprising amount of the nature-nurture evidence is allowed to depend. 
Some time ago I dealt with the objections to this test,':? which I will 
only summarize briefly here. 

(1) The component items are frequently tests of scholastic attain- 
ment and life experience rather than ‘‘G.”’ 

(2) The test items are too few in number (over any limited age 
range) for good consistency or validity. 

(3) The higher mental ages are not catered for. 

(4) Certain special group factors play a large part, notably “V”’ 
or the verbal factor; the “practical ability” found by Alexander,’ 
El Koussy,* and in the Chicago research,’ and almost certainly the 
“F” factor of “fluency of association” which is a matter of tempera- 
ment rather than of cognitive ability.*.’ 

(5) If, as most clinical psychologists concede, the test is not con- 
cerned with any one ability, the use of a single quantitative value for 
the hodgepodge is meaningless. 

(6) In consequence of dilution of the “‘G’’ measurement with 
scholastic attainment and life experience, which is less scattered than 
“@”’ (e.g., the old dull child has more experience than the young bright) 
the Binet does not give a standard deviation of intelligence quotients 
as wide as that which actually exists. 

(7) The personal situation in this form of individual testing is 
not an unmixed blessing, producing possible embarrassment in the 
subject and subjectivity of scoring in the examiner. 

The gravamen of these objections applies as much to the revisions 
of the Binet as to the original. As I have remarked elsewhere, a 
person of Binet’s lively mind would be the last to be using the Binet 
test in the present stage of advance, and ‘‘the prolonged worship of the 
Binet scale has left us with an encumbering heritage of erroneous 





1 Cattell, Raymond B.: ‘‘ Measurement Versus Intuition in Applied Psychol- 
ogy.” Character and Personality, Vol. vu, 1937. 

2 Cattell, Raymond B.: A Guide to Mental Testing, 1936. 

* Alexander, W. P.: ‘Intelligence, Concrete and Abstract.” Brit. Journ. 
Psych. Monograph Supplement, No. 19, 1936. 

‘El Koussy, A. A. H.: “The Visual Perception of Space.” Brit. Journ. 
Psych. Monograph Supplement, No. 20, 1937. 

* Reports 1-9 of the Spearman-Holzinger Unitary Trait Study. Psychology 
Department, University of Chicago, 1935. 

* Cattell, R. B.: “Temperament Tests. II Tests.” Brit. Journ. Psych., 
Vol. xxrv, 1933. 


7 Cattell, R. B.: A Guide to Mental Testing, 1936. 
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conceptions, especially in matters concerning the distribution of 
intelligence and its réle to society.’”’ This remark is quoted by Burt 
in a recent article’ in which he continues to defend the Binet test, but 
yet definitely admits that “The ideal plan would be to take each 
separate test problem and examine its special value in a criterion of 
intelligence. Curiously enough, this has rarely been attempted.’’ 

Bristol and the present writer in 1932 evaluated the “G”’ satura- 
tions of seven of the Binet subtests in the course of producing from 
eighteen types of test the best test for children of four to eight years of 
age.* Four of the seven came in the five lowest tests on the list of 
eighteen, their mean intercorrelations averaging less than 0.30. As 
regards the alternative basis of evaluation—that actually used in the 
Binet revisions—which considers increase of score with age to be the 
criterion of intelligence, Spearman‘ has well said that it would lead 
to measuring the child’s intelligence by counting his teeth. 

Evidence of the unusual heavy weighting with acquired skills is 
found in such researches as that in which Freeman‘ correlated scores on 
intelligence tests with the estimated difference of educational back- 
ground among identical twins reared apart, with the following result: 


Binet IQ difference with education difference................ .791 
Otis IQ difference with education difference................. .547 


Since the Otis itself cannot be considered entirely free from pedagogical 
effects the Binet is evidently heavily weighted in this respect. Nor is 
this surprising when one reflects on such typical items as that in which 
the child is asked to define a “guitar,” “treasury,” “‘milksop,” ete. 
But the final reduction to absurdity of that hasty test construction 
which has neither relation eduction nor “G”’ saturation as its guiding 





1 Burt, C.: ‘‘The Latest Revision of the Binet Intelligence Test.”” Eugenics 
Review, Vol. xxx, No. 4, 1939. 

2 With the same frankness as to the untenability of his position, Burt admits 
that no two editors of the Binet agree about the order of mental age items. He 
continues, ‘‘The second ‘Paper Cutting Test,’ which Terman assigns to the third 
or highest level of ‘Superior Adults’ we find can be done at age fourteen”’ whilst 
‘‘Giving similarities between three things” is at the eleventh and fourteenth- 
year-old levels in America and Britain respectively. 

* Cattell, R. B. and Bristol, H.: “Intelligence Tests for Mental Ages of Four 
to Eight Years.” Brit. Journ. Ed. Psych., Vol. 11, No. II, 1933. 

‘Spearman, C.: ‘The New Stanford Revision of the Binet.” The Human 
Factor, Vol. x1, 1937. 

5 Freeman, F. N., Holzinger, K. J., and Newman, H. H.: Twins: A Study of 
Heredity and Environment, 1937. 
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principle, appears when some of these “tests” are intercorrelated. 
Thus Furfey and Muhlenbein,' taking one of the most popular of the 
numerous recent infant test scales, found that the order of seventy- 
one children had no correlation with the order on later testings by the 
Stanford Binet.? 

The Binet test is discussed more fully, not because it is most open 
to criticism, but because it is most frequently used. ‘To escape from 
such a test into performance tests is to go from the frying pan into the 
fire; for in avoiding knowledge and verbal skills we lose intelligence 
itself, many performance tests being largely a measure of manual 
dexterity.* 


“GREATEST COMMON KNOWLEDGE” AMONG DIVERSE CULTURES 


In spite of the fervour with which some psychologists foster the 
impotent attitude that differences of intelligence with social status, 
race, or nature-nurture factors must remain permanently uninvesti- 
gated, the viewpoint cannot and need not be accepted. The 
possibility of finding among different culture groups a common 
ground of knowledge, on which operations of reasoning could be 
performed, is not chimerical. The following list of objects common to 
the observations of men wherever and however they live is given as 
an illustration of a possible nucleus, upon which careful investigation of 
primitive and civilized cultures might build a far longer and more 
detailed matrix of items for intelligence tests: 


Common objects: 


The human body and its parts. 

Footprints, etc. 

Trees (schematic and unspecific) (except for Eskimos!). 

Four-legged animals (schematic and unspecific). 

Earth and sky. 

Clouds, sun, moon, stars, lightning. 

Fire and smoke. 

Water and its transformations. 

Parents and children (growth) and simple family relationships (except 
in special tribes). 





1Furfey, P. H., and Muhlenbeim, J.: “The Validity of Infant Intelligence 
Tests.” Journ. Genetic Psych., Vol. xu, 1932, pp. 219-223. 


? Yet this test happens to be the basis for a widely repeated conclusion that the ~ 


“intelligence” of nursery-school children has no relation to the intelligence or 
social status of the parents. 
3 Cox, J. W.: Manual Dezterity, 1935. 
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Common processes: 


Breathing, choking, coughing, sneezing. 
Eating, drinking, defaecating, urinating. 
Sleeping. 

Birth and death. 

Running, walking, climbing, jumping. 
Striking, stroking. 

Sensing—seeing, hearing, smelling, tasting, etc. 
Emotional experiences, anger, grief, etc. 


If even this bare nuclear list does not provide a sufficiently rich 
variety of fundaments between which relations for intelligence tests 
can be built up, it is a reflection on the ingenuity of the psychologist. 
Of course the fundaments would have to be given in pictorial or verbal 
form and therein occurs the difficulty that words which translate with 
different connotations would have to be avoided. More serious is 
the objection that the same objects are themselves invested with 
different meanings by different cultures; but this is an objection to the 
intelligence tests suggested by the anthropologist rather than to those 
of the psychologist; for, as we shall see later, the latter can choose his 
relationships in such a way that only perceptual knowledge of the 
objects is involved. 

Field anthropologists, whom the present writer has consulted as to 
subjects which have sufficiently strong interest, familiarity, and 
universality to make a basis for reasoning tests among primitive 
peoples, generally suggest such material as is involved in hunting and 
fishing, tracking, tribal law, and case histories, genealogies and family 
relationships. ‘These provide complete fields in which the primitive 
shows highly agile reasoning powers. 

They indicate tests in the form of ‘following directions,” and 
“‘riddles,”’ which have a play value for the native. This is a possible 
line of approach but is rendered difficult by the specificity of many 
of the knowledge items to particular cultures and climatic regions; 
we shall desert it in favor of an entirely new technique. Something 
midway between this purely anthropological approach and the test 
using abstract relations between common objects has, however, been 
exploited with remarkable skill and success by Porteus, who has shown 
in practical fashions the relative independence of environment which 
such tests achieve.'? From such field work it seems clear that suitable 





1 Porteus, S. D.: The Psychology of a Primitive People, 1930. 
: Primitive Intelligence and Environment, 1937. 
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tests could be built on a properly investigated ‘‘greatest common 
knowledge”’ basis. 


THE ORIGINS OF THE PERCEPTUAL INTELLIGENCE TEST 


Nevertheless we need not follow that difficult path, for recent 
work has revealed a new approach. As early as 1926 Davey' had 
shown that pictorial ‘“‘tests of intelligence” involved the same “G”’ 


factor as current intelligence tests. In 1931 Line,? while investigating « 


visual perception in children, discovered that a certain test involving 
the eduction of relations between simple geometrical (i.e. less than 
pictorial) shapes were highly saturated with “G.” Almost simul- 
taneously Fortes* brought evidence towards the conclusion that valid 


“Gq” tests could be made from relation eduction in simple non- , 


connotative visual material. 


As Davey’s, Forte’s, and Line’s work had only been on small ; 


populations of one hundred, Stephenson undertook a very thorough 
research and mathematical analysis on ten hundred thirty-seven 
subjects. He confirmed that the same “‘G” factor ran through verbal 
and non-verbal tests‘ and proved what till then had only been sus- 


pected: that a group factor of “verbal skill” ran through all verbal, 


tests.° 


With this assurance Spearman published his visual perception test 


with pantomime directions® in which the items had only their “per- 
ceptual”’ meaning and did not depend on “‘apperceptial associations,” 
1.€., were geometrical rather than pictorial. Arsenian,’ Lorge,*® and 
Zubin investigated the test in this country and the first showed that it 





1 Davey, C. M.: “‘A Comparison of Group Verbal and Pictorial Tests of 
Intelligence.” Brit. Journ. Psych., Vol. xvu, 1926. 

2Line, W.: “The Growth of Visual Perception in Children.” Brit. Journ. 
Psych. Monograph Supplement, No. 15, 1931. 

* Fortes, M.: A New Application of the Theory of Noegenesis to the Problem of 
Mental Testing. Ph. D. Thesis, Univ. of London Library. 

‘Stephenson, W.: “‘Tetrad Differences for Non-verbal Subtests.” Journ. 
Educ. Psych., Vol. xxu, 1931. 

. : “Tetrad Differences for Non-verbal and Verbal Tests.”” Journ. 
Educ. Psych., Vol. xx, 1931. 

* Spearman, C.: The Spearman Visual Perception Test, 1933. 

7 Arsenian, Seth: ‘‘The Spearman Visual Perception Test (Part I). With 
Pantomime Direction.” Brit. Journ. Educ. Psych., Vol. vir, 1937, pp. 287-301. 

® Dorge, I., and Arsenian, S.: ‘‘A Comparison of the Scores of the Spearman 
Visual Perception Test, Part I, Administered by Verbal and Pantomime Direc- 
tions.” Journ. Educ. Psych., 1938, pp. 520-522. 
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revealed significant differences between racial groups in situations in 
which the usual tests would have been ambiguous. He also found the 
following correlations: 


Consistency coefficient.................eee eens 0.882 + 0.0062 
Correlation with Pintner non-language test....... 0.610 
Correlation with C A V D when pantomime direc- 
EOS PT OT OTC 0.5808 
Correlation with C A V D when verbal directions 
Rise. nn oad bats Kade aea baneoae 0.4795 


Finally the test was included in the large scale factor analysis 
inquiry at Mooseheart, Illinois, under Thurstone and others, where it 
was again shown to be highly “‘G” saturated and free from any group 
factor. ! 

Some psychologists are slow to avail themselves of this type of test 
because its material comes seemingly from a single narrow field of 
experience, whereas they are used to sampling as widely as possible. 
We know, however, from the “‘ Principle of Indifference of Indicator’’? 
that the general ability factor can be soundly measured by tests from 
any field however narrow, providing on analysis they prove to have 
good saturation and to be free from group factor overlap. Inci- 
dentally the same principle promises success to culture-free intelligence 
tests based on even a small nucleus of ‘‘ greatest common knowledge.”’ 

The choice of further forms of perceptual test to make suitable 
subtests for a culture-free intelligence test can be guided not only by 
the above specific researches, but also by the commonly accepted 
observations that in general the kinds of test showing the best “‘G”’ 
saturation are those involving relation and correlate education in a 
high degree and reproduction in the lowest degree. The individual’s 
general ability might, therefore, be defined by the order of complexity 
of the relations which he is capable of handling. It is regrettable 
that no one has yet empirically classified common relationships accord- 
ing to complexity, extending from the simplest space or time relation 
to the most complex relation of evidence. A notable new form based 
on this principle is the “Progressive Matrix” tried out recently by 
Penrose and Raven® and which we shall describe with added modifica- 
tions below. 





1 Reports 1-9 of the Spearman-Holzinger Unitary Trait Study. Psych. Dept., 
University of Chicago, 1935. 

* Spearman, C.: Abilities of Man, 1927. 

* Penrose, L. F., and Raven, J. C.: “‘A New Series of Perceptual Tests: Pre- 
liminary Communication.” Brit. Journ. Medical Psych., Vol. xv1, 1938. 
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CONSTRUCTION PRINCIPLES IN A COMPLETE PERCEPTUAL TEST 


Aiming at deriving a culture-free intelligence test from this line - 


of research we finally decided on the seven subtests listed below. 
The use of seven instead of one or two is not through any doubt as to 
the soundness of the principle of indifference of indicator, but to avoid 
weighting with one special factor, and, above all, to maintain interest 
through variety, an important necessity with cultural groups lacking 
habits of sustained concentration. 














Number of items 
Subteste Practice| Main 
part part 
ck tal enh aie de ih aiteee hl iat hawad oa vewe eee 2 10 
ii aha id a itta 0 ins picket iee Liadidle k hts acid bi coh dian Sie cat 5 15 
a gai A tS ein Dee i lela dias dn 5 15 
Progressive Matrices I relation matrix first order............ 5 15 
Progressive Matrices II relation matrix second order......... 5 15 
Progressive Matrices III sequence matrix................... 5 15 
a 6 ociic ie aveweaeabanddued o. tudaewiaeen uel 5 15 
A hss ork Aen steak ik Binh eh Hck vo a a ee 32 100 











These tests are chosen as having most consistently and in different 
situations and populations manifested good ‘‘G” saturation. Their 
order is dictated largely by considerations of interest. The maze test 
has not always shown such good “G”’ correlation as the others; but as 
Porteus' has shown, it is as intriguing to primitive as to civilized people, 
and can therefore act here as a good “shock absorber”’ before the more 
artificial test forms. Series follows because that also has natural 
interest and connects with natural happenings, e.g., growth. The 
mirror image test, which is very simple in form, demanding only short 
periods of attention, comes last, when fatigue may be present. 

With the object of maintaining some direct attractiveness in the 
test items, as an ancillary to ulterior incentives, the drawings are some- 
times representative of real objects (man, animal, tree), but only 
of such objects as would be common in the above sense; and even then 





1 Porteus, S. D.: Primitive Intelligence and Environment, 1937, p. 237; “‘it is 
also susceptible to practice improvement.” 
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the solution of the item is neither aided nor confused by the pictorial 
associations, but depends directly on the perceptual evidence. 

The subject’s conception as to what operation is required of him in 
‘' each of seven subtests is made to depend more on worked examples 
_ than on verbal instructions. The test could, if necessary, be given in 
| pantomime. In the Progressive Matrices this education of the subject 
| to a particular operation and mental set proceeds through carefully 
graded demonstration items. 

Apart from these special considerations the following precautions 
fcund in the usual type of intelligence test have been adopted: 

(1) There is a sufficient number of pass or fail items for the age 
range in question: one hundred items (one hundred thirty two includ- 
ing practice) for an age range of “eleven years and upward.” 

(2) Selective, not inventive, answers are required. 

(3) The items have been arranged in order of difficulty by a pre- 
liminary research. 

(4) A sufficiency of alternatives, of a “near correct”’ character, is 
introduced to reduce the proportion of “‘chance correct’”’ answers to 
a low figure. It is found that six alternatives can easily be surveyed 
by the subject in this type of test; where two answers are required 
this reduces the ‘‘chance correct”’ ratio to 1:15. 

(5) The main test is preceded by a “‘ practice’ part, with sufficient 
interval between the practice and the main part to permit some con- 
solidation of notions encountered in the practice part. The results 
of the practice part are thrown away, whether the subjects have done 
such a test before or not. Since, as Vernon shows, it is “sophistica- 
tion” rather than “practice”’ that accounts for improvement in test 
scores (largely occurring between the first testing and the repetition), 
it is hoped that the greater part of test familiarity errors in individual 
variation measures will be eliminated. 

(6) A time limit, of a fairly generous order, seems permissible 
on grounds of theory,?* and desirable for practical convenience. The 
time assigned to each test is such that approximately seventy-five per 
cent of the subjects complete all items, and results in the whole test 





1Vernon, P. E.: “Intelligence Test Sophistication.” Brit. Journ. Educ. 
Psych., Vol. v111, 1938. 

2 Spearman, C.: Abilities of Man, 1927, Chap. XIV. 

’ Thorndike, E. L.: ‘Tests of Intelligence, Reliability, Significance, Suscep- 
tibility to Special Training, and Adaption to the General Nature of the Task,” 
School and Society, Vol. rx, 1919. 
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(practice included) takingforty minutes. Because the researches which 
show that “speed with intention to speed” and “‘G” are not separable, 
have been based only on civilized populations, we are not entitled to say 
that the same would be true of mixed populations. Therefore, it has 
been thought desirable to standardize the test under both conditions: 
(a) timed and (6) with unlimited time. ’ 

Descriptions of the construction of subtests ies | Lom 
follow: _ 

1. Mazes.—No adequate research exists as to | —_ “4 
the best design of maze tests, save that they must | 
be seen as a whole, not merely run through, and 
that they can be scored about equally well by either ar 
time or errors.! For interest’s sake, the present Fia. 1. 
mazes were run through from within outwards, to imaginary food, to 
utilize both the escape drive and the food seeking goal, vicariously. 
Further, (a) the alternative paths were placed early, to force the sub- 
ject to deliberate on the maze as a whole; and (b) the maze was 
designed to force the same consideration upon the subject if he should 
try the short cut of running the maze backwards (in imagination). 
The mazes grade from four passages to ten passages wide. (Fig. 1.) 

2. Series.—These build up from progressive variations in shape or 
size to variations, progressive or alternating, in relations between 
shapes and size. (Fig. 2.) 


V4 \4 | wn| 4 lg \ > 
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Fig. 2. 


3. Classification.—The “‘right type” and “‘ wrong type” method of 
Line,? Spearman,* and others has the advantage of being clear and 
direct, but is too space consuming, and its simplicity is necessary only 
with younger children. Picking out the odd item, on the other hand, 
has a certain intrinsic fascination, and resembles operations known to 
primitives (e.g., picking out the odd animal from the herd). Two odd 


 Porteus, 8. D.: Primitive Intelligence and Environment, 1937. 

*Line, W.: “The Growth of Visual Perception in Children.’ Brit, Journ. 
Psych. Monograph Supplement, No. 15, 1931. 

* Spearman, C.: The Spearman Visual Perception Test, 1933. 
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items were required here from six, since the chances of ‘‘chance cor- 
rect’’ solutions are considerably lowered compared with one odd item. 
The only other condition specially observed here was that the need of 
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“searching around” for the feature on which differentiation is to 
depend should be cut down to a minimum, by conspicuously balancing 
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among all items the false irrelevant and partial differentiations. Thus 
in item 18 there are no two figures having only curved lines or only 
straight lines, whilst the duality of each figure is immediately con- 
spicuous. (Fig. 3.) 
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4. Relation Matrices: First Order—The subject is required to ! 
complete the figure by adding the fourth card, chosen from among the 
six alternatives at the bottom. 

Beginning with a plain matching, Fig. 4; continuing through a 
bi-laterally symmetrical example, Fig. 5; and'so into the main examples, 
Fig. 6, in which a relationship has to be educed between Figs. 4 and 5 
and applied to Fig. 6 to produce a correlate. This correlate can be 
confirmed by a similar process beginning with the relation between 
the two left-hand figures. These are really analogies tests, overdeter- 
mined, and in perceptual form. 

5. Relation Matrices: Second Order.—Raven has extended the above 
type of matrix to include nine figures. The required operation can 
be gradually inculcated by the steps shown in Figs. 7 and 8. It 
seems to have been overlooked in designing this matrix type that the 
second row and the: second column are unnecessary in educing the 
relations which define the missing item. Or, regarded from another 
angle, the third row and column are unnecessary, providing, as is 
usually the case, the relation between two and three is the same as 
between one and two, 7.e., if the trend is continuous. 

An improvement is, therefore, possible in this matrix design, con- 
sisting in requiring the subject to perform a more complex relational 
operation on the same simple perceptual material. He is now, (after 
the above introduction), given only the first two figures in the first 
row and column. (Fig. 9.) From applying the relation between the 
first and second to the second he arrives at the third figure. From 
the relation of the first and third figures in the row, now applied to 
the third in the column, he arrives, as in the first order relation 
matrix, at the missing item. 

6. SequENcE Marrices.—Both Stephenson and Raven have 
found that the nine items matrix may be used as an intelligence test 
also when the determination of the missing item depends upon a per- 
ception of sequence (conjunction relation) instead of relations of 
the above kind. 

The new subtest is introduced first by horizontal and then by hori- 
zontal and vertical ‘‘sequences,” “‘rhythms,” or ‘‘cycles.” (Fig. 10.) 

Then, first, the horizontal and, secondly, the vertical rhythms may 
be “staggered”’ or set in different phases. (Figs. 1land12.) Finally 





‘ Penrose, L. F., and Raven, J. G.: “‘A New Series of Perceptual Tests: Pre- 
liminary Communication.” Brit. Journ. Medical Psych., Vol. xv1, 1936. 
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the ‘“‘staggered’’ sequence relation can be combined with the original 
straight column sequence and even a second order relation eduction ina 
dizzy palimpsest of superposed relationships, which may be further 
complicated by reducing the given figures to four. No research has 
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yet shown whether such steps increase the ‘‘G”’ saturation, but, since 
each relationship applies to a different aspect of the figures, it is possible 
that the gain as an intelligence test, resulting from more complex 
relation play, is more than compensated for by the introduction of 
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some, presumably temperamental, factor invoked by the greater need 
to “‘search around” for the fundaments of the relations. (Figs. 13 
and 14.) 







































































? 


. v) . 




















Sole 


2/2 Ses (Ol|HOL 
z/2)Le| So 



























































w 















































> |®@ 





























































































































ES EES) 


7 7% id 
z ) 
% 















































The variety of forms used in this matrix test, requiring continued 
re-orientation on the part of the subject, may be helpful in eliminating 
gain from ‘‘test sophistication.” 

7. Mirror Imaces.—The images are mirrored about a horizontal 
axis, in order that the universal experience of seeing reflections in a 
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| pool may be utilized in the instructions. Items are made more difficult 
i by rearranging masses rather than by increasing detail. (Fig. 15.) 


TYPES OF RELATION FOUND IN PERCEPTUAL TESTS 


Because of the theoretical interest and practical problems of 
test construction that associate themselves with the thesis that “G”’ 
is coincident with relation and correlate eduction, it is desirable to 
pause and ask what relations can be employed in perceptual “‘G”’ tests 
and how they stand with respect to relations in general. Spearman! 
has classified all possible relations in the following eleven categories: 

(1) Real.—Space; Time; Psychological (Object-Subject) ; Identity; 
Attribution; Causation; Constitution. 

(2) Ideal.—Evidence; Likeness; Conjunction; Intermixture. 
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Fig. 15. 


























The distinction between real and ideal is, as Spearman shows, 4 
traditional “metaphysical nicety.”” Clearly all these relations admit 
of being built up from relations in space and time (and consciousness) ; 
they are higher order relationships in a hierarchy which has space and 
time (and consciousness e.g., intensity of sensation) as its base. 

\ In perceptual tests, as a glance at the above examples will show, 
we deal with relations of “bigger,” ‘darker,’ ‘“‘re-orientated,” 
“added,” or ‘“‘multiplied,” or “divided,” ‘different in proportions,” 
“‘more curved,” ‘‘more uniform,” “truncated,” etc. These relations, 
Spearman has shown,? are resolvable into distance, direction, and 
likeness, being based on fundaments of blackness and position. But 

, this overlooks the utilization of shades of greyness (i.e., intensity of 





1 Spearman, C.: Abilities of Man, 1927. 
2 Spearman, C.: “Intelligence Tests.”” Eugenics Review, Vol. xxx, 1939. 
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sensation) which makes another available fundament. Valid analogies 
and series tests have already been made on grey intensities alone.' 

The fundamental relations possible in perceptual tests of this kind 
are, therefore, those of space, visually perceived, and visual sensation 
qualities. But out of these can be built relations of Similarity, 
Attribution, Identity, Constitution, Conjunction, Intermixture and, 
indeed, all higher order relations save those which involve Time and 
the Psychological relations, e.g., Causality and certain relations of 
Evidence. 

Confining the operations to those that depend on spatial relations 
does not seem to have reduced the universality of “‘G”’ in these tests; 
perhaps because so many higher order relations can be built on those 
of space; perhaps because most people handle problems in which time 
is conceptually involved by thinking of it in spatial images, e.g., as in 
school history charts or in Galton’s clock images of time. 

It is interesting to note that the perceptual time relation has already 
been independently used by Porteus? in his pragmatic approach to a 
culture-free intelligence test, in the subtest in which the subjects 
listened to a sequence of tones on axylophone. This is an illustration 
of the position one can reach on theoretical grounds: that a perceptual 
intelligence test could be built out of relationships from fundaments 
in any sense modality, seeing, hearing, tasting, etc. The extent of the 
“G@” factor in Seashore’s musical aptitude tests shows that with more 
attention to higher relations of rhythm, pitch, and intensity, sound 
fundaments could equally be used for an ‘‘intelligence” test. Because 
of the weight of apparatus required for experiments with most non- 
visual senses, and because of the danger that physical sense defects 
might become important, it seems best to restrict perceptual tests to 
vision and specifically to non-color vision, in optically groomed popula- 
tions. There is considerable opportunity for research, however, into 
the sensory range in which perceptual tests are practicable and 


advantageous. 
MOTIVATION IN TEST PERFORMANCE 


Although the present test is intended primarily for studyin 
intelligence differences in social and cultural divisions of civilized 





1 El Koussy, A. H.: ‘‘A Note on the Greys Analogy Test.” Brit. Journ. Educ. 
Psych., Vol. rv, 1938, p. 294. 


* Porteus, 8. D.: The Psychology of a Primitive People, 1931. 
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countries, it should admit of being used also with primitive peoples. 
A priori there would seem to be nothing objectionable for this 
purpose, in the cognitive material of the perceptual test but 
its use could perhaps be criticized on orectic grounds, for the 
interests, habits of attention, and normal speeds of working of primi- 
tive peoples are widely different from those assumable in school 


‘ educated populations.’ 


Apart from introducing intrinsic and “play” interest? into the 


\ objects to be manipulated, allowing indefinite time and giving variety 


of subtest, it might appear that nothing has been done here to make 
the test as universally applicable orectically as cognitively. 

Research has shown' that increases of motivation beyond a certain 
minimum level of concentration do not produce increases in intelligence 
test score. The number of items attempted increases, but so does the 
number of errors. An army may gain on a narrow salient by con- 
centrating reinforcements but it cannot concentrate on all fronts at 
once. Similarly effort may improve some narrow specific skill but 
seems powerless to increase the general mental capacity. Subtests 
involving certain special factors, notably inferences, require, however, 
more effort than others, e.g., opposites.*. There is no reason to suppose 
that these findings regarding effort need be restricted to civilized people 
with ready-made attitudes of attention in examination situations. 
If the individual can be made to attend to a normal extent, by incen- 
tives of food, prestige reward, gifts, threats, or any of the numerous 
possible motivation sources, his intelligence can be measured, and 
more powerful motivation is not required to increase the accuracy 
of the measurement. 

Further research as to the interchangeability of motives and the 


| effects of varying intensity of motives is required; but the indications 
' from the existing research are that the question of motive is best solved 
', ad hoc by the field-worker on the spot, who can best judge what ade- 





1 Klineberg, O.: ‘‘ Racial Differences in Speed and Accuracy.” Journal of 
Abnormal and Social Psychology, Vol. xxu1, pp. 273-277. 

' 2In the research quoted (Cattell and Bristol, ‘Intelligence Tests for Mental 
Ages of Four to Eight Years’’) the writers experimented with food (candy) in 
puzzle boxes for children of four to eight years. The correlations were no better 
than for tests done under ‘‘please the experimenter”’ motivation. 

3 Wild, E. H.: “Influence of Conation upon Cognition; Part II.” Brit. Journ. 
Psych., Vol. xv, 1927. 
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quate motives he may stimulate in various groups. The work of 
Porteus shows that the tactful experimenter can induce a proper test 
attitude in even the most barbarous peoples, by studying their incen- 
tive systems. Accurate use of the present test in such situations is 
intended to be facilitated by the practice test and by administering the 
test individually. 
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COMPARABLE TESTS OF VERBAL AND NON-VERBAL 
REASONING: THEIR CONSTRUCTION AND 
APPLICATION TO DEVELOPMENTAL PROBLEMS 


LEON BRODY 
New York University 


It has been frequently suggested that some individuals are more 
fairly tested by non-verbal tests of intelligence than by verbal tests; 
indeed, that both types of tests should be administered to obtain a 
meaningful and reliable picture of anyone’s mental ability. It is 
difficult, however, to interpret the results of such testing, because of 
lack of comparability between ordinary verbal and non-verbal tests of 
intelligence. Among other things, they differ in scaling methods, 
norms, number and kinds of cases on which they are standardized, and 
level of difficulty of test items. It is the major purpose of this article 
to report the construction and standardization of unusually comparable 
tests of verbal and non-verbal reasoning. ‘Two points must be borne 
in mind throughout this report: First, there is no claim of complete 
comparability; second, the qualifying adjectives, ‘‘verbal’’ and “‘non- 
verbal,” are here considered to refer only to the objective character of 
the test materials themselves. 

The verbal test items were of four kinds: Concrete-classification, 
employing the classification type of reasoning item with reference 
solely to real objects or living things, such as knife, clock, cow; abstract- 
classification, involving so-called abstract situations, such as love and 
fear (as opposed to real, tangible objects or living things); concrete- 
analogies, employing the analogies type of item with reference solely to 
real objects or living things; and abstract-analogies, referring to 
abstract situations, as defined above. The non-verbal test items 
included exact pictorial representations, in classification and analogy 
forms, of the aforementioned verbal concrete items; and quasi- 
abstract reasoning items involving geometric symbols and figures in 
classification and analogy forms. Thus, more or less complete com- 
parability in content was achieved between verbal and non-verbal 
concrete items; but the verbal and non-verbal abstract items are alike 
only in a common omission of reference to real, concrete objects or 
living things. 

Examples of the form and content of these test items are given in 
Fig. A. 
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The first form of this test consisted of more than double the number 
of items intended for ultimate use. It was given to a group of twelve 
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Non-verbal concrete test item (classification type) 
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Non-verbal abstruct test item (classification type) 
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Non-verbal concrete test item (analogy type) 
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Non-verbel abstract test item (unalogy type) 
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3. razor 
4, knife —" 
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Verbal concrete test item Vorbal abstract test item 
(classification type) (classification type) 
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man woman boy dog girl baby book 2 














Verbal concrete test item (analogy type) 
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Verbal abstract test item (analogy type) 
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adults whose ages ranged from twenty-two to thirty-five. Of this 
number, five were college graduates, four had attended college for some 
period of time, and three had only a high-school education. These 
individuals independently, and also among themselves, considered the 
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possible validity of each item, its accuracy, and even its relative 
difficulty. Altogether, they spent several hours analyzing the test and 
considering each item not only from their own point of view, but also 
from the likely point of view or reaction of school children and high- 
school youth. Their opinions and differences were noted with respect 
to each item, as well as their general attitude toward the mechanical 
set-up of the test. On the basis of these notations and of the author’s 
subjective opinion, a preliminary test was devised in two forms—lower 
and higher, the latter containing the more difficult items. Each form 
consisted of two parts, verbal and non-verbal. The following chart 
indicates the nature of these parts: 























Number of items 
Non-verbal test................ NVC-el NVA-el | NVC-an | NVA-an 
ERE 23 23 23 23 
REST EETET LTT 23 22 22 22 
WB ccccccccccccescccccel a VA-el VC-an VA-an 
SE ere 23 23 23 23 
Ras ak dinn'en oe dad Oi 23 23 22 23 
Key— 
V: Verbal. A: Abstract. 
NV: Non-verbal. cl: Classification. 
C: Concrete. an: Analogies. 


The items were grouped by form and content, all verbal concrete items 
in classification form being kept together, and so forth. In each of 
these groups or sub-tests, items were arranged in estimated order of 
difficulty. As in the case of the previous form, two special precautions 
were observed. First, care was taken that the correct answers were 
fairly well distributed as to position; second, in the verbal portion of 
the test, and consequently in the case of parallel items in the non-verbal 
form, changes were made to avoid the element of vocabulary difficulty, 
in order to get at real and important reasoning relationships. 

This new form of the test was reproduced entirely by photo-offset 
and given more permanent body than the previous form. 


PRELIMINARY TESTING 


Preliminary testing was conducted in two schools, one an elemen- 
tary school (Public School 63X, New York City) limited to the first six 
grades, and the other a junior high school (Public School 44X, New 
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York City). In the former, only the lower form was administered. 
In the latter, one group of students received the higher form and 
another received the lower form. These schools were fairly typical 
of the schools of New York City, and drew their students from an 
average population. 

The group tested in the elementary school was constituted as 
follows: All pupils were drawn from the fifth grade, five classes being 
represented. Of this number, one was a bright class that included 
thirty-eight students; one was a slow class that included thirty-one 
students; one was a very bright class that included twenty students; and 
two were so-called normal classes that included sixty-three students. 
The differentiation had been made on the basis of teachers’ judgments, 
grades received, and intelligence ratings. 

In the junior high school all students tested were drawn from the 
ninth grade. They included one bright class of thirty-two; one slower 
class of thirty-five; and two so-called normal classes totaling sixty 
students. Here, however, the differentiation was not as distinct as in 
the case of the elementary school. The two normal classes received 
the lower form of the examination, while the other two received the 
higher form. 

The directions given were the usual ones for the types of items 
employed. While the preliminary testing was being administered, 
time limits were determined by observation of activity and by request 
for a show of hands as soon as the pupils had finished each subtest. 
The following time limits were obtained: 


MINUTES 
I Rt oN , Saabs ob ae O ce Giae Kenner onan 4 
ES ro st. Ben oe Le Caerae ekioaeh wie 4l6 
I 2), hs. vin Ua aa SE Rediet ones 5% 


IIE EE POT OT OT EE aD 5 


Later, in the final administration, one-half minute was cut down from 
each of the subtest time limits for grades IV through VI, and another 
half minute for grades VII through XII, because that amount of time 
became really excessive, as evidenced by the behavior of the pupils. 

After the tests had been administered, they were all scored by 
determining the number of correct answers for each subtest, and by 
totalling these figures to get the total scores for both verbal and non- 
verbal forms. Tables I, II, and III present all data so determined, 
and derived data necessary for the revision of the preliminary form of 
the test. 
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Bright group—twenty pupils. 


The Journal of Educational Psychology 


Normal group—one hundred forty-nine pupils (including bright and slow). 


Slow group—thirty-one pupils. 


Tasie I.—Basic Data ror Lower Form 















































Range of scores Mean Sigma CR of 
difference 
Test between 
5A 9A 5A 9A 5A 9A M’s 
NVC-ce....... 0-22 15-23 15.12 19.93 | 4.78 1.67 10.7 
NV A-cl 0-18 8-18 10.06 14.20 | 3.58 | 2.40 9.9 
NVC-an...... 0-17 5-22 8.34 14.06 | 4.19 | 3.51 9.9 
NVA-an...... 1-19 5-21 8.32 15.06 | 3.71 3.93 11.6 
er 2-22 9-23 13.63 19.30 | 4.23 | 2.68 11.3 
Se 1-17 7-20 8.67 15.36 | 3.66 | 3.12 13.4 
Sa 1-17 4-22 9.87 15.65 | 3.35 | 3.97 10.0 
VA-an........ 2-22 6-19 9.68 14.85 | 3.33 | 2.48 12.3 
ee 30-134 | 80-160 | 83.73 | 128.43 | 20.36 | 16.52 16.6 
N(5A)-149 
N(9A)- 60 
Tasie I].—Basitc Data ror Hicuer Form 
Test Range of scores Mean Sigma 
a et i cae ge a 3-14 10.02 2.63 
a a ie a gs ae 2-15 7.89 2.75 
A I eee eee. ep a 3-14 7.97 2.38 
aos elt dine ag adee beawebees 2-16 7.25 2.93 
eh i re 5-16 10.22 2.54 
a i te ae 4-18 11.37 3.06 
CT ee ee re eer 4-16 8.86 2.73 
RS a dt ia he kia Rae Ciaw akin 3-19 7.94 2.76 
A i es ig ca a A i ct 49-110 71.55 12.45 
N(9A)-67 














(1) On the lower form, there is a very decided difference between the 
means of the ninth-graders and the means of the fifth-graders, on the indi- 
vidual subtests as well as on the entire test. 


rather convincing figure, thus achieving one of the major objectives of this 


preliminary testing. 


The reliability formula gives a 
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(2) With one or two exceptions, the means appear to be rather uniform 
for all test on each of the two grade levels. This fact, together with the 
consistent uniformity in magnitude of the various sigmas within the fifth 
grade as well as within the ninth grade, points to a rather desirable uniformity 
of difficulty among the various subtests in the lower form. 

(3) The fifth-grade classes tested included three normal classes, one bright 
class, and one slow class. The pupils fell into these categories on the basis 
of three factors; namely, intelligence test ratings, school achievement, and 
teachers’ judgment. A comparison of the scores of each of these groups on 
each of the subtests, as well as on the entire test, indicates a gradation which is 
immediately obvious to the eye, and entirely as expected. In fact, the 
differences and trends seem to be remarkably consistent, especially in view of 
the fact that the pupils are all of the same grade and approximately the same 


TasBLE II].—Comparison oF Lower Form Scores MApE BY BRIGHT AND 
Stow 5A Groups 








(1) (2) 
Test Mean score, Mean aaa, Mean score, Per ee (2) 
bright nome slow ae 
PE ccbendacscuwen 17.4 15.1 7.3 42 
Pv dnlddt-inedse% 11.4 10.1 8.4 74 
cnn cliente tien ae 9.0 8.3 7.0 78 
NVA-an 10.5 8.3 af 68 
SS cba ak be ie 4050 16.4 13.6 9.6 59 
a, pi ceewedonndl 11.1 8.7 6.5 59 
Pe iiisacaee dace as 11.8 9.9 7.8 66 
a nth Go csnaie Gio’ 10.9 9.7 8.1 74 
Ps ccosiceawnd 98.2 83 .7 67.3 68 

















age. The number of cases is too small to warrant reliability data on the 
differences between the means of the bright and slow groups. In general, the 
average slow group score is only two-thirds as large as the average bright group 
score, which is a fairly significant difference. The latter becomes even more 
pronounced in the case of individual subtests. These data bear out another 
important objective of the preliminary testing. 

(4) In general, similar desirable findings occur for the higher form. 


The next step was to revise the test for final administration. While 
originally it had been planned to test college students as well as the 
levels already tested, this plan was abandoned because of administra- 
tive difficulties. Instead, it was decided that the lower form would, 
with some revision, suffice for testing through the high-school levels. 
With respect to revision, certain data obtained through the preliminary 
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testing were significant. In the case of NVC-cl, it was obvious that 
the mean scores were decidedly too high on both the 5A and 9A levels, 
especially on the former. In other words, this test was too easy. To 
remedy this, it was decided to eliminate those items that failed to 
differentiate between 5A and 9A on the whole. This was followed by 
the addition from the higher form of a few items which were passed by 
forty to sixty per cent of 9A pupils, and, especially for the sake of 
grades X to XII in the final testing, a few items from the higher form 
passed by less than forty per cent of 9A pupils. Finally, those items 
were eliminated which failed to discriminate between the brighter and 
the slower pupils of the fifth grade. 

In the case of NV A-cl, the same treatment was necessary, although 
the means here were not as extreme as in the preceding case. What- 
ever changes were made in NVC-cl and NVC-an, were made in VC-cl 
and VC-an in order to maintain identity of items in the comparable 
tests. There was no real reason for having the non-verbal form of 
these items serve as the pattern for revision of the verbal form; but 
it was decided that, inasmuch as some control was needed, the non- 
verbal form might just as well serve this purpose. 

The final step in revision of the test was concerned with the rear- 
ranging of the test items in each subtest in order of difficulty, a few 
items having to be dropped altogether because of exceedingly high or 
low percentage of failure. 


FINAL TESTING 


The revised form of the test, now with one sample and twenty-one 
regular items in each subtest, was administered to more than fifteen 
hundred pupils of four New York City public schools. Grades IV 
through VI were tested at Public School 63X on February 10, 11, and 
15, 1937; grades VII and VIII were tested in Public School 233, 
Brooklyn, on February 25 and March 2; grade [X was tested at the 
Winthrop Junior High School, Brooklyn, on February 24 and 26; 
grades X through XII were tested at the Thomas Jefferson High School 
in Brooklyn, on March 3 and 5. In all cases, the non-verbal test was 
given first and the verbal test on the later date. 


VALIDITY AND RELIABILITY OF THE TESTS 


The usual interpretation of validity of a measuring instrument is 
that it refers to the extent to which it measures whatever it is supposed 
to measure. There is no direct way of estimating the validity of a 
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particular test. In consequence of such a limitation, the trend today is 
to define tests more or less strictly in terms of what they actually do. 
It is for this reason that the present tests are considered as tests of 
specific types of verbal and non-verbal reasoning rather than as tests 
of reasoning as such, or as tests, verbal and non-verbal, of intelligence. 

Nevertheless, such specification does not preclude the consideration 
of validity. As long as there is a purpose for which a test is con- 
structed, validity must remain a consideration. The method here used 
of insuring a desirable amount of validity has been described. Briefly, 
the method revolved about the personal evaluation of the test items by 
the author of the test, in the light of the purpose he had previously 
decided upon. This subjective method was checked by careful con- 
sideration of each item by a group of adults that sampled various 
educational levels, and by distinctions observed in the preliminary 
testing and data based thereon. 

There are several ways of measuring the reliability of atest. Here, 
the reliability of half of the test was measured by correlating the odd- 
even sets of scores. Then, by means of Spearman’s formula, the 
reliability of the whole test was determined. Table IV presents the 
essential data. 


Taste IV.—RELIABILITY COEFFICIENTS AND STANDARD DEVIATIONS OF WHOLE 
SUBTESTS AND ENTIRE TEST 








Grade NVC | NVA ve VA NV V_ | Entire test* 
V r .82 .82 m | . 80 91 .80 .94 
SD 6.12 | 5.96) 5.52; 5.61 | 10.63 | 9.44 
VIII r .77 . 86 .73 .79 .90 . 84 .94 
SD 5.80 | 6.48 | 5.82/ 5.81 | 11.00 | 10.64 
XI r 80 . 76 74 77 86 .82 .93 























SD 5.41 5.15 | 4.60 | 4.75 | 9.28 | 8.46 








N = sixty in all cases, except for NVA, eleventh grade, where N = fifty-three. 
1 Obtained by averaging the reliability coefficients of NVC, NVA, VC, and VA, 
and by substituting the result in Spearman’s “ Prophecy” formula, using N as 4. 


The reliability coefficients obtained apply to cases chosen at random 
from three grades: V, VIII,and XI. These levels were selected because 
they are quite representative of all the grades tested, each being the 
middle grade of three consecutive grades. It was not possible, nor 
would it have been feasible, to determine the reliability coefficients 
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of the subtests and the entire test for each one of the grades involved in 
the entire testing program. For each of the three grades represented, 
the reliability coefficients for NVC, NVA, VC, and VA are remarkably 
consistent and, furthermore, notably high. The latter fact is more 
significant when it is pointed out that there were only forty-two items 
in each of these tests. 

The reliability coefficients for NV and V are, of course, even 
higher, particularly in the case of NV. In fact, there can be no ques- 
tion as to the reliability of NV, and little question as to the reliability 
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of V, for the grades tested. The coefficients for the entire test are 


very high. However, the entire test is of less concern than the 
subtests. 


VERBAL AND NON-VERBAL REASONING IN RELATION TO AGE 


Much has been written on the growth of intelligence as such, but 
the growth of intelligence in terms of its verbal and non-verbal aspects, 
if such a distinction is permissible, is a relatively unexplored field. 
This article does not purport to submit a solution of this particular 
problem. It seeks rather to open the way for such a study by analyz- 
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ing the development, with age and school grade, of certain types of 
so-called verbal and non-verbal reasoning. 

The data in Tables V and VI, together with Figs. 1 and 2—obtained 
from the results of the final testing—furnish basic information about 
the development of the four major types of reasoning considered in 
this study. 

There appears to be a consistent and fairly rapid growth in all of the 
functions from the lowest age level concerned; namely, 78-101 months, 
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to the age level of 174-185 months. Thereafter occur a break and 
leveling off. The non-verbal functions apparently cease to develop 
further. In the case of the verbal functions, however, a new rapid rise 
is manifested, at least between the ages of one hundred ninety-eight 
months and two hundred thirty-three months. 

The brief decline is probably unnatural. The group of pupils 
tested in the Winthrop Junior High School were test-minded, and this 
fact is consistently borne out by the various scores obtained. The 
average age of this group is 170.69 months, which places their scores at 
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about the peak for each function just before the fall and leveling of the 
curves. In other words, it is quite likely that a more average group 
of individuals about that age would have scored lower, which would 


TasBLe V.—Ace Data For SustTgests AND ENTIRE TEsT 













































































NVC NVA vc VA 
I I Ill IV i 
Age |N 
Sigma |Mean| Sigma Mean|Sigma Mean|Sigma |Mean/Sigma | Mean 
78-101) 42) 5.09 |16.83] 6.07 15.95] 5.26 |18.59) 4.43 [14.38] 16.73) 65.76 
102-113)174| 6.67 |19.37] 7.46 oe 6.87 |18.79] 6.15 |14.83] 23.14) 72.71 
114-125)192) 6.04 |21.55} 6.18 |20.19} 6.90 |21.22) 6.00 |15.69) 21.61) 78.67 
126-137/167| 6.21 |23.97] 6.40 |23.04] 7.02 |25.00} 6.36 |18.62] 22.24) 90.64 
138-149)167) 5.62 |25.52] 6.82 24.49) 7.22 |26.25] 6.78 |21.23) 22.92) 97.50 
150—161|166| 6.17 |29.96] 6.56 |26.38] 6.47 |29.62] 7.35 |24.50] 22.96/107.47 
162-173)200) 5.58 |28.61] 6.24 |29.11}) 5.70 |31.04] 6.14 |26.75] 20.65)115.52 
174-185/161| 5.05 |29.52] 5.96 |29.22] 4.44 |32.83] 5.85 |29.10} 17.59/120.68 
186-197|147| 5.02 |28.48] 5.30 [23.38] 4.99 |31.46) 5.74 |28.65} 17.24/116.99 
198-209) 71) 6.43 |28.64] 5.78 |27.90} 5.05 |31.50} 6.02 |28.69} 19.82)116.74 
210-233) 27| 6.39 |29.11] 6.23 |27.48] 3.66 |33.74) 4.20 |31.33) 14.97/121.66 
TaBLeE VI.—Grape Data For SustTests AND ENTIRE TEST 
NVC NVA vc VA 
I Il Ill IV on 
Grade | N 
Sigma | Mean] Sigma | Mean] Sigma | Mean|Sigma | Mean] Sigma} Mean 
IV (290) 6.48 |18.75) 6.72 |17.64] 6.45 |18.23) 5.41 |13.43] 20.74) 68.07 
‘if V |142) 6.15 21.401 6.20 |20.36) 5.87 |22.42) 5.11 |16.41] 19.58) 80.60 
it. VI |264| 5.46 |25.15}] 6.16 23.90} 6.85 |25.67] 6.21 |19.38] 20.68) 94.12 
VII (141) 5.01 |26.17} 5.95 |25.09] 5.01 |28.12] 5.77 |23.17] 18.08/102.56 
" VIII (138) 5.31 |26.54] 6.10 |26.78] 5.26 |30.09] 5.71 |25.02] 18.56)108.45 
IX |201) 4.37 |30.83] 4.75 |31.13] 4.00 |33.46] 4.43 |29.27] 13.94|124.71 
X {137) 5.23 |27.86] 5.63 [27.67] 4.78 |31.04) 5.45 |28.27] 17.15)114.85 
XI {116} 5.08 |29.05) 5.09 |29.20] 4.40 |32.50] 4.76 |29.63] 15.44/120.39 
XII | 85) 5.96 |30.41) 5.88 |30.02] 3.70 |34.43] 4.15 |82.83] 15.24|127.70 


























have meant, at least, that there would not have been the decline 
pictured in the graph. 

It is interesting to note the position of each function in relation to 
the other at the lower extremity of the age scale. The functions are 
seen to rank in the following order: Verbal concrete, non-verbal con- 


4 
t 


cr 


fu 
la’ 
ab 


te 
ite 
ca 
ve 
in 
ea 
sp 


fai 
all 
sil 
th 
Sei 
sig 


to 





Tests of Verbal and Non-verbal Reasoning 191 


crete, non-verbal abstract, and verbal abstract. This order is main- 
tained fairly consistently up to the point of leveling-off; but when the 
functions emerge from the section of the graph characterized by the 
latter, a new order presents itself, namely, verbal concrete, verbal 
abstract, non-verbal concrete, and non-verbal abstract. In other 
words, at the beginning it would appear that for the age groups 
involved, thinking in terms of concrete things is easier than thinking in 
terms of abstract ideas or symbols; also, that in the case of the concrete 
items, verbality is slightly preferred over non-verbality, while in the 
case of the abstract items, non-verbality is definitely preferred over 
verbality. However, at the other end of the age scale we notice that, 
in general, verbality takes preference over non-verbality, and that in 
each instance scores are higher for concrete items than for the corre- 
sponding abstract items. 

All this may be accounted for, perhaps, in this way: It is to be 
expected that young children will be able to deal more easily with 
concrete materials, whether in the form of words or pictures, or even 
symbols, the latter referring in a sense to NVA. However, with 
increase in age, there comes a much greater familiarity and facility 
with words. The individual becomes increasingly accustomed to 
thinking in terms of linguistic symbols for real objects and situations, 
and he becomes less and less accustomed to deal with pictures or 
similar representations—something the child is more confronted with 
than older people. This, of course, does not mean that his ability to 
deal with non-verbal materials is less than that of someone several 
years his junior. The scores show that that is not so. But it is a 
functional development, which seems to parallel the usual picture 
representing the growth of intelligence. Whatever conclusions may be 
drawn from the facts presented, it must be remembered that age 
represents a totality of influences, not something intrinsic to the indi- 
vidual and apag from his past, present, and future environments. 

It should i pointed out here that one reason accounting for the 
fairly consistent superiority of VC over NVC may be the fact that in 
all cases, the non-verbal test was presented before the verbal test, and 
since NVC and VC are identical in content, theoretically speaking, 
there may have been a significant carry-over from the first to the 
second. The question as to whether or not the carry-over was 
significant cannot be answered on the basis of the available data. It 
would require a controlled experiment rotating the order of the tests 
to determine this. It may be that the carry-over is not significantly 
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responsible for the superiority of VC over NVC, in view of the fact that 


several days intervened usually between the presentation of the first 
test and the presentation of the second. Furthermore, the difference 
in form, that is, the verbality as opposed to the non-verbality, with the 
concomitant differences in mechanical set-up, should have hindered 
transfer somewhat. 


VERBAL AND NON-VERBAL REASONING IN RELATION TO GRADE IN 
SCHOOL 


If grade may be considered a measure of school achievement or an 
indication of the level of scholastic attainment, it should be of interest 
to study the development of the various reasoning functions measured 
by the present tests, in relation to this factor. Table VI presents the 
sigmas and means obtained from the administration of these tests to 
the pupils of nine consecutive school levels, namely, grades IV through 
XII. This table, together with Fig. 2, is the major source of the 
information presented in this section. 

It is apparent that for all of the functions involved, there is a very 
rapid and consistent rise from grade IV through grade VI. Around 
this point, there is a slight negative acceleration evidenced by the 
curves of the non-verbal functions. The verbal functions maintain 
their rate of growth. After grade VIII, there is a rather steep rise in 
the case of all the functions. As explained before, this rise is probably 
not natural to the functions measured, it being occasioned by the 
unusual testing background of the ninth-grade pupils tested. From 
the appearance of the curves, it seems safe to assume that with an 
ordinary ninth-grade group, the “‘abnormality”’ of this rise would be 
almost completely eliminated. There follows, as is expected under 
these circumstances, an obvious decline between grade IX and grade X; 
but thereafter the upward trend is again manifested, particularly in the 
case of the verbal functions. 

In consideration of the individual functions, it is evident that verbal 
abstract reasoning has the slowest start. The other three functions 
are more or less concurrent up to grade VI, at which point verbal 
concrete reasoning draws apart from the rest, maintaining this supe- 
riority up to the end of thescale. Its pictorial counterpart lags behind 
it, and eventually falls behind both types of verbal reasoning. Verbal 
abstract reasoning is throughout considerably removed from verbal 
concrete reasoning; but between grades IX and X it passes both 
non-verbal reasoning functions and develops rapidly in a way that 
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indicates it might soon catch up to verbal concrete reasoning. In 
general, there is great similarity between the age and grade curves of 
development. 

The reliability formula (Diff/c,,,), when applied to the differences 
between the means of parallel tests (NVC and’ VC; NVA and VA) and 
to the differences between the means for consecutive age groups and 
for consecutive grades, demonstrates that in all probability the 
developmental trends are as pictured. For example, the chances are 
ninety-nine out of one hundred (Diff/ou. = 2.44) that the difference 
between the VA means of age group “‘ 198-209 months”’ and age group 
‘210-233 months” is a true one; and we can be equally certain 
(Diff /o i = 2.40) of the difference between the VC means of these 
groups. Table VII illustrates further the reliability of the relation- 
ships found. 


TasLe VII.—DIFreRENCES BETWEEN MzAN Scores ON PARALLEL TESTS 


























NVC-VC _—— NVA-VA Chinen 
Grade in 100 in 100 
Diff Diff /caiee, Diff Diff /caies, 
V —1.02 — 2.27 99 3.95 8.78 100 
VII —1.96 — 2.93 100 1.92 4.00 100 
IX —2.64 —11.00 100 1.86 5.17 100 
XI —3.45 — 8.21 100 — .43 — .90 82 











SUMMARY AND CONCLUSIONS 


(1) The tests devised illustrate the possibility of obtaining highly 
comparable measures of verbal and non-verbal reasoning.' Such tests 
may well serve to depict more completely an individual’s mental 
ability, and to examine more fairly in those cases where the language 
factor has to be minimized or controlled. They may also have 
diagnostic value in connection with the study and analysis of linguistic 
backwardness or disability. It is generally significant that the tests 
reliably and validly cover grades IV through XII and corresponding 
chronological ages. 

(2) As measured by the tests used, the development of verbal and 
non-verbal reasoning (whether concrete or abstract in nature) is a 





1 Apart from the artificial comparability of the verbal and non-verbal materials, 
intercorrelations of results on the different tests, as pointed out in a subsequent 
article, run above .70 and .80. 
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definite function of age. It is characterized by generally similar trends 
in all cases up to the one hundred ninety-eight-months’ mark. At this 
point, verbal reasoning seems to draw apart rapidly from non-verbal 
reasoning, which seems to stop developing after the one hundred 
seventy-fourth-months’ mark. On the basis of the scale of units used, 
verbal concrete reasoning is, with one minor exception, first in develop- 
ment for the entire age range, the differences between verbal concrete 
reasoning and non-verbal concrete reasoning tending to increase with 
age. Verbal abstract reasoning develops most slowly, but eventually 
passes both types of non-verbal reasoning, which run close together 
throughout. In terms of school grade or educational level, the 
development of these types of reasoning is very similar to the develop- 
ment with age. 

(3) The late superiority of verbal reasoning is due, no doubt, to the 
growing familiarity with verbal symbols (of things and situations) that 
comes with advancement in scholastic work and, in general, with 
increasing participation in a world of words. While this does not at all 
mean that non-verbal representations cannot be adequately dealt with 
by the adult, it probably does mean that he comes by habit or practice 
to prefer the verbal. 

(4) It should be particularly interesting to students of the growth 
of intelligence that verbal reasoning ability of the kinds tested (not 
just vocabulary ability) shows a strong tendency to increase rather 
than to decrease after the age of seventeen; and that the greatest 
acceleration at this time is evidenced by verbal reasoning of the 
abstract type, presumably one of the most advanced kinds of reasoning 
of which man is capable. 
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MEASURING THE ATTITUDES OF ELEMENTARY- 
SCHOOL CHILDREN TOWARD THEIR TEACHERS 


SISTERS M. AMATORA TSCHECHTELIN AND M. JOHN FRANCES 
HIPSKIND 
St. Francis High School, Lafayette, Ind. 
AND 
H. H. REMMERS 


Purdue University 


So far as formal education is concerned the proposition that the 
classroom teacher is one of the most fundamental factors in the learning 
of school children will find ready acceptance. All psychologists, 
whether eclectic in their allegiance to theory or as members of one or 
another of the various schools of psychology, agree upon the importance 
of the affective or feeling components of learning. Whether these are 
discussed under the varying currencies of psychological theory such 
as the law of effect, motivation, affective valence, or other similar 
headings, there is substantial agreement that children’s attitudes are 
of primary importance in the effective acquisition of knowledge, skill, 
interests, attitudes, ideals, etc., with which the school purports to 
concern itself. What the psychologists are agreed upon as sound 
theory would also be substantiated by an adequate sampling of the 
common-sense judgment of parents and children. 

Granting then the soundness of these two propositions—.e., the 
importance of children’s attitudes and the importance of the teacher 
in learning situations with which the school is concerned—it seems 
wholly logical and practically as well as theoretically important to 
obtain adequate measures of children’s attitudes toward their teachers. 
Reasonably adequate devices for measuring the attitudes of high-school 
pupils and college students toward their teachers are already available.* 
Yet in the light of modern psychological knowledge and theory it 
seems highly likely that the affective experiences of younger children 
have relatively greater potency in moulding a desirable personality 
than do the affective experiences of older individuals. On every count, 
therefore, it seems desirable to provide a measuring device for assaying 
the attitudes of elementary-school children toward their teachers and 
to apply such an instrument to pertinent problems. 





1 Remmers, H. H.: ‘‘ Appraisal of College Teaching Through Visitation, Ratings 
and Pupil Opinion.” 1939 Yearbook, the American Association of College Teachers 
of Education. 
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The general plan in developing the measuring instrument which is 
the subject of the present report was twofold: First, to provide a general 
survey instrument by means of which a rather generalized picture of 
children’s attitudes toward the teacher might be obtained and, second, 
to build a diagnostic instrument with two comparable forms which 
would provide detailed and specific information upon the strengths 
and weaknesses of teachers as viewed by children, to the end that 
‘‘remedial’’ measures might be adopted in order to change any unfa- 
vorable attitude situations which might be discovered to more desirable 
ones. 

Accordingly the major aspects of the teacher’s personality were 
through logical analysis subsumed under seven general areas. These 
seven areas as included in the scale are labeled: 


I. Liking for Teacher 
II. Ability to Explain 
III. Kindliness, Friendliness, and Understanding 
IV. Fairness in Grading 
V. Discipline (Keeping order with the children) 
VI. Amount of Work Required 
VII. Liking for Lessons 


For the general survey instrument (hereafter called the short scale) 
these general areas were turned into questions with instructions for 
the pupils to rate a given teacher on a five-point scale. For the diag- 
nostic instrument (hereafter called the long scale) there were assembled 
several hundred statements describing what teachers areordo. These 
were scaled experimentally according to the equally often noticed 
difference principle based on the Weber-Fechner law.? 

For each of the seven areas of the short scale there were then 
selected seven statements as scaled which belonged in this category so 
that the whole long scale is made up of seven shorter scales with 
approximately equal scale distances between the various diagnostic 
statements. 

A sample of Form A of both the short scale and the longer scale as 
described above is shown on pages 197 and 198. 





1 Thurstone, L. L. and Chave, E. J.: Measurement of Attitude toward the Church. 
Chicago, University of Chicago Press, 1929. 
Thurstone, L. L.: ‘Theory of Attitude Measures.” Psychological Review, 
Vol. xxv1, 1929, pp. 222-241. 
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For Form B the short scale is repeated but the diagnostic items are 
all different from those in Form A except that they parallel the items 
of Form A in scale values. 

Since only one form of the short scale is available, reliability of the 
scale was determined by an adaptation of the ‘“‘split-test’”’ procedure. 
For thirty-one teachers in Grades IV to VIII widely distributed geo- 
graphically the papers of each of the thirty-one teachers were divided 
into two chance piles and correlated according to the following schema: 


Pupils 14+3+5+7+-°-- = versus pupils 2+4+6+8+4 
A DIAGNOSTIC TEACHER-RATING SCALE 


Sister M. Amatora Tschechtelin Edited by H. H. Remmers 
Form A = 
Name of School Date 
What grade are you in? Encircleone:4 5 6 7 8 
Are you a boy ora girl? Encircle one: Boy Girl How old are you? ___ years, 
—_— months. 
Following are a number of questions about your teachers. Please answer them 


honestly. Your teachers will never know how you have rated them. Do not 
write your name on this sheet. 






































5 means “the best;” IV. How fairis your 
4 means ‘‘very good;” 5)5/5/5 teacher in grading? 
3 means “average” or “about as 
good as any teacher;” 4\4\4)4 
2 means “below average” or “less 
than for most teachers;” 3/3) 3)3 
lmeans “ve r”’ or “the 
—eeo 2|2/2/2 
” 1}1}1]1 
Co 
a 
o 
& 
& 
I. How well do you like V. How well does your 
5/5/5/5 your teacher? 55/5) 5 teacher keep order with 
the children? 
4\4\4/4 4\4\4/)4 
3/3/3/3 3|3/3/3 
2)2)2/\2 2|2\2/)2 
1i1/}1/1 1j)/1)1)1 
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Form A 
Read each statement; if it tells some- IV. Fairness in Grading 
thing true about your teacher place a 
plus sign (+) in the proper square at 22. Always gives the grades 
the left. earned. 
23. Gives fair grades. 
= pa 
o one 
< I. Liking for Teacher 24. Is quite fair in grading. 
o 
ag Sh 25. Gives fair grades some- 
1. Is the one I like the [-|—|}—/— times. 
best. 26. Gives the boys better 
2. Ishumorous at times. [--|—|}—/— grades. 
27. Grades some children 
3. Keeps everything in too low. 
= the room neat. 28. Never is fair in grading. 
4. Is pretty. 
V. Discipline 
5. Is not polite. (Keeping order with the 
Children) 
6. Al f R 
ineiaieaiaantiaaiias 29. Always keeps good 
7. Is to ie order in a cheerful way. 
biti atiait toe 30. Keeps good order. 











31. Does not act “bossy.” 





32. Is always on time. 





33. Is too easy-going. 





34. Has a quick temper. 





35. Always finds fault with 
everything one does. 




















° x. The correlation between these chance halves were then 
stepped up by the Spearman-Brown formula with the results shown in 
Table I. It will be noted that the range of reliabilities is from .86 for 
“‘Amount of Work Required”’ to .96 for ‘‘ Liking for the Teacher.” 

The reliability of the long scale by areas and for the total scale are 
given in Table II. The obtained reliabilities for Form A vs. Form B 
are amply adequate for such group comparisons as are involved in 
pupil-ratings of teachers ranging as they do from a value of .72 for 
discipline to .81 for amount of work required. Moreover, it is highly 
likely that the reliabilities as here given are relatively low because of 
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the probably low range of talent. A new sample of three hundred 
children in one school was used. Had a wider sampling been obtained 
it is likely that these reliabilities would have been increased. 


Tasie I.—ReuiaBiuity or Att Ratines ComBINneD FoR ALL AREAS OF THE SHORT 
ScaLE 
(N = 31 teachers, 610 pupils) 














Intra-attitudes r PE, 

i os os woes seus da ebiemabaw eae . 96 .007 
ee PI cnn ae Peg ew .92 .015 
III. Kindness, friendliness, and understanding...............| .88 .019 
es gs cnt ce Lieu Ee S sain he aaigad a cies .89 .018 
SESS Te ey ay Lk rene Sarena Spree aay .89 .018 
ee II 6 os cv cy eee ce dcaneueewedwaes . 86 .022 
Wee I is vhs sdn0ecdbdcpesecadecsbauneseees .87 .034 





What of the validity of the measuring instrument? In one sense, 
since the instrument is designed to measure the attitude of pupils, it is 
sufficient to state that to the extent that such attitudes are reliably 


Tas_e [].—REvIiABILITIES OF THE LONG SCALE 














(N = 300) 

Intra-attitude Forms A vs. B long scale Coefficients corrected for 

attenuation 

I .76 + .016 .90 + .007 

II .73 + .019 .93 + .005 

III .75 + .017 .93 + .005 

IV .80 + .014 .97 + .003 

V .72 + .019 .90 + .007 

VI .81 + .014 .95 + .004 

VII .76 + .016 .87 + .010 

General scale .79 + .015 .87 + .010 





measured they are also validly measured, since we are here not con- 
cerned with measuring the characteristics that teachers would actually 
possess in the sight of some omniscient judge, but rather the character- 
istics which they possess in the eyes of the children whom they teach. 
As T. L. Kelley’ pointed out more than a dozen years ago, “If compe- 





1 Kelley, T. L.: The Influence of Nurture upon Individual Difference. Mac- 
millan, 1926, p. 9. 


200 The Journal of Educational Psychology 


tent judges appraise Individual A as being as much better than Indi- 
vidual B as Individual B is better than Individual C, then it is so, as 
there is no higher authority to appeal to.” 


Tas ze III.—Corrricients oF RELIABILITY FOR THE RaTINGs OF ONE SuPERVISOR, 
AND COEFFICIENTS OF CORRELATION BETWEEN AVERAGE RatTInGs oF Pupiis 
AND AVERAGE OF Two SuPERVISORS 











Reliability for one super- | Correlation between super- 
visor visors and pupils 
Intra-attitude , 
Tr rT 

I } .32 — .09 
II 41 .10 
Ill 51 .61 
IV .33 — .33 
V .48 .17 
VI .38 .17 
VII .73 25 











For the long scale an added argument for validity is the logic under- 
lying the experimental construction of the scale. If verbalized 
opinions are measures of attitude, then, since the scale quite obviously 
measures verbalized opinions, it must also measure attitudes. 


Taste [V.—Means, STanDARD DEvIATIONS, AND CRITICAL RATIO FOR TEACHERS 
Ratep as HicHEst AND LOWEST 














Grades Critical ratio of 
Teacher rating N Mean 8D mean difference 
Highest........ 5 and 6 59 | 4.7 
Highest........ 7 and 8 68 | 4.7 
Highest........ 5,6, 7,8 | 127 | 4.7 + .026| .44 + .019 
40.3 
Lowest........ 5 and 6 59 | 1.9 
Lowest........ 7 and 8 68 | 1.7 
Lowest........ 5,6, 7,8 | 127 | 1.8 + .067 | .62 + .027 














Doubtless, however, prone as we are to subtle ego inflation by 
demanding the judgment of ‘“‘mature” and “‘competent”’ persons it is 
of interest to note the relationship of the ratings of supervisors to the 
ratings of the pupils as shown in Table III. The two striking facts 
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here are that the reliabilities of the supervisor’s judgment are of the 
same order of magnitude as those determined for students rating their 
instructors at the college level and the almost complete absence of 
relationship between the measures of pupils’ attitudes and super- 
visor’s attitudes. 

To the extent that individual teachers known to differ as judged by 
the attitudes of pupils toward them are differentiated it may be argued 
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Fig. 1. 


that the scale is valid. For the short scale sixty-eight pupils were 
asked to express their attitudes toward the best teacher they had ever 
known and the poorest teacher they had ever known. Table IV and 
Figure 1 show the results. The complete dichotomy and absence of 
overlapping is striking indeed. 

Yet another argument for validity rests upon the intercorrelations 
of the seven areas as measured by the long scale and also the correlation 
between the long scale and the short scale. Tables V and VI give the 
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TaBLeE V.—INTERCORRELATION COEFFICIENTS OF THE INTRA-SCALE WITH THE 
GENERAL ScALE 











General 
Intra-scales I II III IV Vv VI oi 
VII -16 + .046).16 + .046).19 + .046).10 + .047|.06 + .050).27 + .045/.63 + .029 
VI .383 + .042).12 + .047|/.24 + .045).20 + .046).16 + .046 -66 + .027 
Vv .20 + .046/).06 + .050).15 + .045).16 + .046 -60 + .031 
IV .82 + .042|).24 + .045).24 + .045 -69 + .025 
III -16 + .046).25 + .045 -55 + .033 
II 25 + .045 -51 + .035 
I -66 + .027 























necessary data. Obviously each of the intra-scales measures something 
unique and independent of all the others. 
A scale closely similar to the long scale here described when applied 


TaBLe VI.—CorRRELATIONS OF SHORT SCALE versus LoNG ScaALE By ‘‘ AREAS” AND 











TOTAL 
(N = 300) 

Intra-attitudes Form A Form B 
I .88 + .009 .80 + .014 
II .74 + .018 .82 + .013 
III .77 + .016 84+ .011 
IV .79 + .015 .85 + .010 
V .84 + .011 .75 + .017 
VI .83 + .012 .88 + .009 
VII .86 + .010 .88 + .009 
General scale .90 + .007 .91 + .006 








to thirteen hundred fifty-seven children in Grades IV to VIII yielded 
results from which the following facts may be summarized: 

(1) The average attitude of children toward their teachers is sub- 
stantially favorable—an average of 8.62. The children in question 
varied from rural and city public and parochial school children, Grades 
IV to VIII, in northern cities to parochial school children in a southern 
city. The range of the averages was from 8.25 to 8.85. 

(2) Comparisons between rural and city children in every case were 
higher for the rural situation. The chances are approximately two 
hundred fifty-six to one of this being a true finding. 

(3) No statistically reliable differences exist as between public and 
parochial schools. 
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(4) No consistent trend in the averages was found as related to 
age or grade. 

(5) Highly significant differences were found among individual 
teachers and also among different subjects in schools where work was 
departmentalized. 

(6) No appreciable correlation was found between attitudes as 
measured and group intelligence test scores (r = .1 + .03, N = 552); 
no relationship was found between achievement as measured by marks 
and attitudes (r = .1 + .03, N = 527) nor, as already observed, 
was there any relationship between attitudes and chronological age 
(r = .05 + .03, N = 660). 

In summary the preceding presentation warrants the statement 
that an instrument of sufficient validity and reliability has been 
constructed to make possible the following application: 

(1) Self-supervision and self-remediation of teachers in situations 
in which undesirable pupil attitudes are obtained is made possible. 

(2) An additional research instrument is provided by means of 
which it will be possible to study the interrelationships of the various 
factors operating in the total achievement of school children. 

(3) For the supervisor and the administrator so inclined it will be 
possible to obtain measures of known reliability and validity of the 
attitudes of children toward teachers under their direction. 

(4) In the training of prospective teachers while they are serving 
their apprenticeship or interneship as practice-teachers the instrument 
here provided can serve a most useful function. 


VALIDITY AND RELIABILITY OF THE PROPOSED 
CLASSIFICATION OF SPELLING ERRORS, II* 


GEORGE SPACHE 
Friends Seminary, New York City 


In the preceding article! a system of classifying spelling errors, 
derived largely from the practices of earlier writers, was proposed. 
At the same time a number of statistical requirements were enumerated 
by which such a system might be evaluated. It is the purpose of 
this article to satisfy those requirements. 


VALIDITY 


One measure of the validity of a tabulation of errors is the extent to 
which its distribution resembles those of other systems. It is reason- 
able to presume that if a particular classification of errors results in 
distributing the errors in amounts similar to those found by other 
systems, that classification is not distorting what may be considered a 
usual or normal distribution. This presumption is based upon the 
further premise, which will be employed later, that the average of 
the number or per cents of errors of various types found in a number 
of studies represents the usual or normal distribution. 

In Table I the per cents of spelling errors given in a number of 
earlier studies are given. The difficulty of comparison with the data 
of the present study is complicated by the fact that different error 
types were tabulated by almost every writer. A brief review of the 
classifications described in the preceding article! will readily demon- 
strate the varying concepts of what types spelling errors assume, 
according to different students. In the present summary the data 
given by others have been tabulated in terms of the error types 
employed here. That is to say, whenever the error types of another 
writer might legitimately be combined under one of the present 
catagories, they have been so treated. Error types counted by others 
but not included in the present classification are omitted from the 
table. No other manipulation of the data of others has been employed. 

The data for the present study are derived from the spelling errors 
of twenty-five average and twenty-five poor spellers of grades III-V. 


* This is the second of three articles on diagnostic work in spelling. The 
preceding article proposed a system of classifying spelling errors. The third article 
will appear in The Journal of Educational Research. 
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The poor spellers were chosen at random from among those brought 
to the writer for assistance by parents and teachers. The average 
spellers were selected from the pupils of Friends Seminary and 
Brooklyn Friends School, New York City. They are children who, 


Taste I.—Per Cents oF Spettinc Errors 




















Carle Carroll 
Book 
Reference & Floyd | Foster} Gill | Gill | Gill | Gill | Gill 
Harter) Fresh- ; 
on Upper} Bright} Dull 
Omissions 
Single letter 
Pieccédeeens 20.51} 7.0 5.6 | 26.6 |20.7| 4.0 | 12.0 |28.0/34.9/28.7|15.5)14.6 
Sounded......... —_ 5.0 0.0 = e 6.0 1“ 8.6) 6.3)10.0) “ - 
Doubled......... 3.5 9.9 | 13.12} 6.1 | 3.4) 6.0 © fedeeheoccsccenne 
Eb eccccccses sane D 2éee B seas 2.6 | 5.2 
OO eee 24.0 | 21.9 | 18.7 | 35.3 |29.3) 14.0 | 12.0 |36.6/41.2/38.7|30.8/27.4 
Additions 
Single letter 
Doubling........ 2 oO es err 3.01 1.5| 4.0 | 19.0 |33.4/22.5|21.0) 8.3) 9.6 
Non-doubling 
Phonetic...... 7.3 | .... | .... | 14.6 |18.1) 5.0 - 7” 35 * - as 
Non-phonetic..| ‘* 4.2} 10.0 ” ws " a a 9: - <5 + 
Rs cedvccoens canard Wane /M, wis Fr o- i ss. “3 “4 3 e 
pT OTe 1 .9 4.2 | 10.0 | 18.6 116.0) 9.0 | 19.0 |33.4)/22.5)21.0) 8.3) 9.6 
Transpositions 
Phonetic 
Non-phonetic 
0 Oe 6.3 5.3 5.0 5.8 | 3.3) 1.0] .... |10.5) 7.2) 5.0 
Phonetic substitutions 
, sR 27.8 | 21.3 | 20.1 | 24.6 22.3) 9.0] .... |11.7|18.6/23.5)35.4/37.3 
Consonant......... 2s 3.1 3.8 - ” 6.0 ey Be ™ “ 1 5.1) 6.4 
Diphthong......... ” sibs & wine ” wife. Feees Mee = ™ 
Pi necesece cs aa rahe EB ééoa & Mae Benen. sees 3.0; “ - * 117.9)17.5 
Entire word........ hvace 5.7 6.33 
inchs ehucans 27.8 | 30.1 | 30.2 | 36.8 (43.3) 15.0 7.0 |11.7/18.5|23.5/58.4/61.2 
Non-phonetic substi- 
tutions 
0 3.2 
Consonant......... 
Diphthong......... ™ 
Pi tcweds+ecne . 
Entire word........ ee, oer Brews Ot Fam 
eee 3.2 8 | 7.3 
Homonyms.......... 2.2 
Incomplete.......... cone Eanes. Benue E &6es Behe 
Unrecognizable....... nine Bdeed. sense 2 6see eee 









































* Ditto marks are used to indicate that the figure given includes the several types of errors 
immediately following. 

1 Figure also includes syllable omission. 

* Includes additions of doubled letter. 

* Includes homonyms. 
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Reference 


Gill 


Gill 


Men- 
den- 
hall 


Russell 
spellers 





Good | Poor 


Umberhine 
spellers 





Good | Poor 


Welch 


Wyckoff 


Pres- 
ent 
study 





Omissions 
Bingle letter 


Additions 
Single letter 
Doubling..... 
Non-doubling 
Phonetic... . 
Non-phonetic 
Syllable......... 


Transpositions 


12.5 
16.5 


29.0 


Phonetic........ ie hame oe 
Non-phonetic...}....|.... 


Phonetic substitu- 
tions 


Consonant...... 


5.6 


16.2 


9.8 


26.0 


oe) 
oo 


Diphthong...... ee 


Syllable......... 


Entire word..... ' 


Non-phonetic sub- 
stitutions 


Homonyms....... ey © 


14.0 


59.8 


14.3 


56.1 


Incomplete........ a 
Unrecognizable....}..../.... 











37.3 


37.3 
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4 Includes non-phonetic substitutions. 
5 Includes phonetic substitutions for diphthong. 
§ Includes omission of syllable. 


on standardized spelling tests of two successive years, evidenced no 
greater acceleration or retardation than six months above or below 
their exact grade status. 

Table I is read, ‘‘In Book and Harter’s study of spelling errors, 
20.5 per cent of the errors were classified as omissions of silent and 
sounded letters and syllables.”’ 
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For the convenience of the reader, the characteristics of these 
various studies are summarized in Table IT. 

In Table III a direct comparison is made between the mean per 
cents of errors found in the present study with the mean of the per 
cents given in the various studies cited in Table I. Such a comparison 
is based on the premise that the average of the per cents given by a 
number of writers represents the usual or normal distribution of 
spelling errors. The data of the present study as given in Table III 
differ from those in Table IT because the latter gives the per cent of 
the total number of errors made by the total group in each catagory. 
The data for the present study in Table III are the mean per cents of 
errors for each type for the fifty cases. 

Equal numbers of studies could not be used in computing the mean 
per cent for each type of error because of the differences in the classi- 
fications used by various writers. Each mean is computed trom the 
figures of those studies in which the errors noted are clearly similar 
to the writer’s method of classification. Table III is read, ‘‘ For 
seven studies the mean per cent of errors of omission of a single silent 
letter is 17.8. The standard deviation of these seven studies is 9.8. 
The mean per cent for the fifty cases of the present study for the same 
error is 12.76, and the SD 3.98. The difference between the means is 
5.04 and its standard error, 3.7. The critical ratio of this difference 
is 1.36.” 

In the larger types of total omissions, additions, transpositions, 
and phonetic substitutions the differences between the means are very 
small. It is only in the distribution of errors within these larger types 
that some real differences arise. Apparently the writer has classified 
more errors of omission of a single letter as omissions of sounded letters 
than is commonly found. This was probably due to the tendency to 
consider such letters as the e in consider as a sounded letter. There 
may be some question as to whether such a letter meets the definition 
employed in classifying errors,” namely, ‘‘a letter that is sounded in 
the usual pronunciation of the words.’’™ 

In the same fashion the writer has classified more errors of phonetic 
substitution for a consonant than have others. Perhaps he has been 
more liberal in his interpretation of the term. There are also signifi- 
cantly fewer phonetic substitutions for a syllable in the data of the 
present study. This may be due to a rather strict interpretation of 
the term. To illustrate, in a substitution involving a three-letter 
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TaBLE II].—Companrison OF Per Cents or SpetuinG Errors IN Present Srupy 
witH THOsE Founp By OTHERS 











Other studies Pensens 
study f 
Diff. | caise. | CR 
N |Mean| SD |Mean| SD 
Omissions 
Single letter 
A de idl wan ce Walaa 7 |17.8 | 9.8 |12.76) 3.98) 5.04) 3.7 | 1.36 
oie Wik in wna RRA 7 | 6.1 | 2.93)11.1 | 5.64) 5.0 | 1.36) 3.67 
EEA ae 14 | 8.69) 4.1 | 6.56) 3.46) 2.13) 1.2 | 1.77 
inte desdvpabkadebes 4/3.9/1.1 | 2.7| 3.0) 1.2 .71) 1.69 
PED FPR ee 22 |30.0 | 8.6 |30.3 | 6.6 .38 | 2.05) .14 
Additions 
Single letter 
ini ts 0% 09 ocben et 9| 4.3 | 1.7 | 3.12} 2.19} 1.18) .64) 1.84 
Non-doubling 
ana kee dened ae 8 |10.0 | 3.9 |12.88) 5.72) 2.88) 1.73) 1.66 
Non-phonetic........... ee ” " - 
TN dander aide waded 4/1.3 .41) .96) .85) .4 .22) 1.81 
i ins ibaeaeneeeehis 22 |14.5 | 6.5 |16.0 | 5.44) 1.5 | 1.58) .95 
Transpositions 
Eee Sem Skee eer! 1.5 | 1.34 
Non-phonetic............... ve pee Vere 3.48) 2.37 
A kee. in ainwsa vase 17 | 5.2 | 2.64) 4.72) 2.74) .48) .73) .65 
Phonetic substitutions 
Een as tee 'cnkbhan de 7 |28.6 |10.8 |18.0 | 6.75)10.6 | 4.2 | 2.52 
id nid ekins eke 9 | 4.9 | 1.07| 9.66) 4.8 | 4.76) .77| 6.18 
Nb b6 ce cans cabo den 5a: Bow ce eaan 2.58) 1.95 
i Giveswcatendoteckt 8 |14.3 | 4.96) 4.92) 3.00) 9.38) 1.85) 5.07 
Entire word See eee 2.24) 2.42 
EE rer 15 |34.3 {18.8 |35.7 | 9.35) 1.4 | 4.9 .28 
Non-phonetic substitutions 
CG Keds wcchtadessal ae Beheceeaeel 2.4 52 
EE eee See ees 3.64) 1.42 
ES ee See eet .66| .46 
ee as oa aninwiek oo-eare re ee weer 1.7 | 1.46 
Entire word.................| 2] 4.0] 3.2] 1.0 .83| 3.0 | 2.21) 1.35 
et natés cadence wea 3 | 3.7 | 3.2 | 7.86) 5.13) 4.16 1.98 2.1 
Rn neuddeanaweaoed 2} 3.3 | 1.1 | 1.46) 1.03) 1.84) .79) 2.32 
Incomplete..................-. 4} 2.2/1.8] 3.3 | 3.84) 1.1 | 1.05] 1.04 
Unrecognizable................ 3} 4.0 | 4.3 | 2.8 | 3.36) 1.2 | 2.53) .47 
Phonetic and non-phonetic sub- 
Ks cecienteamenein 21 | 36.8/15.3 |43.5 |14.5 | 6.7 | 3.8 | 1.76 
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syllable unless all three of the letters differed from the correct spelling, 
the error was not classified as a phonetic substitution of a syllable. 
The one or two errors in the syllable were classified under phonetic 
substitution for a vowel, consonant, or diphthong, in accordance with 
their type. This strict interpretation undoubtedly shifted the tabula- 
tion of some errors into the category, phonetic substitution for a 
consonant. This may account, in part at least, for the greater per 
cent in this type than found by others. 

Comparison between the medians of the present study and those 
of the group of earlier studies supports the evidence of significant dif- 
ference in several error types. The CR of the difference in omission 
of sounded letters is 2.54. In phonetic substitution for a consonant, 
it is 4.73 and in phonetic substitution for a syllable, 4.31. 

Since non-phonetic substitutions appeared in few of the classifying 
schemes, one might expect to find greater per cents of such errors in the 
present data. However, the mean is not reliably different than that 
of the three writers employing such an error type. Similarly, when 
phonetic and non-phonetic substitutions are combined, as in most of 
the previous studies, there is no reliable difference between the means. 

On the whole, the distribution of errors resulting from the use of 
the proposed classification of spelling errors is remarkably like that 
found by averaging the findings of a number of earlier writers. The 
means of the present study differ reliably in only three of the twenty 
error types in which comparisons were possible. Significant dif- 
ferences in the comparisons of the medians of the present study and 
those of others occur in only two types. These differences are appar- 
ently due to slightly more liberal or strict interpretation of the 
definitions of the types. The differences indicate the necessity for 
some revision of the definitions. Stricter definition of the omission of 
a sounded letter and the phonetic substitution for a consonant are 
necessary. A more liberal definition of a phonetic substitution for a 
syllable is also probably desirable. With these revisions, the applica- 
tion of the system for classifying errors as at present defined should 
result in a distribution of errors similar to the average of those obtained 
elsewhere. 


RELIABILITY 


The reliability of a system of classifying spelling errors, like other 
measuring instruments, may be found by determining the split-half 
correlations between the errors in each half of the misspelled words. 
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Of course, such reliability coefficients are influenced by several factors, 
such as the judgment of the examiner and the extent to which the 
definitions of the error types deviate from complete objectivity. It is 
probably not possible to define all errors so that there can be no ques- 
tion as to the type to which a particular error belongs. The defini- 
tions offered in the preceding article were as complete and objective 
as the writer could make them, but they will not cover all spelling 
errors. Classification of some errors must rest upon the decision of 
the classifier. A further complication is the lack of reliability or con- 
sistency of error on the part of the children themselves. For these 
reasons, the reliability coefficients represent the tendencies to err con- 
sistently minus the subjectivity of the examiner and the subjectivity 
of the definition of the error type. 

In Table IV are given the EE corrected coefficients 
between the number or per cent of errors in the first twenty-five words 
and the number or per cent of errors in the second twenty-five mis- 
spelled words. As previously noted, the population upon which the 
data are based was composed of twenty-five poor and twenty-five 
average spellers of Grades III-V. The misspelled words were derived 
from a test of one hundred words from the appropriate fifty per cent 
column of the Buckingham-Ayres Spelling Scale. The table is read: 
“The corrected reliability coefficient for omission of a silent letter 
based on the number of errors is +.26.”’ 

The writer realizes, of course, that the small size of the population 
upon which these coefficients are based limits their dependability. 
However, some tentative conclusions are possible. 

“Most of the coefficients are too low for satisfactory reliability of 
diagnosis. The lack of satisfactory reliability is more strikingly evi- 
dent when one considers that these coefficients are based on a three- 
grade range; namely, Grades III-V. One immediate conclusion is that 
a sample of fifty misspelled words is too small for reliable interpretation 
of the tendencies of individuals to make certain types of errors. 

Fifteen of the coefficients based on per cent of error are greater than 
the r’s for the same types based on number of error. Twelve of the r’s 
based on number of errors are greater than the coefficients of the same 
categories based on per cent of error. This would seem to indicate 
slightly greater reliability for the classification on the basis of per cent 
of error. 

By means of the Spearman-Brown formula the reliability for the 
entire classification (omitting negative r’s) based on the number of 
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Taste IV.—Rewuiasiurry ComFFICIENTs OF Varrous Errors BasEpD ON Firry 


MISSPELLED Worps oF Firty Casps 








Number of! Per cent of 
errors errors 
Omissions 
Single letter 
EL cin 5 cahak> « dOee OW KES Rew Nh eenaes Hea ee 26 .374 
an og ts GE a See cial lia ain aw heal .214 .496 
Sa one a ou Silln wi eabinakded aad We ease ee ake .275 .374 
EE bite ct ontieehescdenseene tens sseekabeaween .507 .684 
ER eee ee or ears barn rare 7 360 .412 
Additions 
Single letter 
ncn ka he knwee shiek hone 4a eee .374 . 360 
Non-doubling 
i ad na miaiiacs hak nana gh anaes +¥eueeee . 709 .666 
PROMOIIINES . ooo ce cence cnscncccesecescseven .507 .591 
PG . £idaba dee bets ba Wao 0Ubaad onesies Cesee'e Jame .165 .333 
dss ininty ahie ibs Ses ets eee bent pees .601 .601 
Transpositions 
id tk end pee Aa eee de ee bas bweeeeses .113 .058 
PD ccc cscnarenccuseescaceveesessaceses .581 .620 
ies cain Cov ehn begs sear sslevectins .518 .529 
Phonetic Substitutions 
i oe, ses cbb dees Séue ets ee ke nek eee .148 .449 
EOE Oe Pe ee eT eee eT, .620 .561 
PE cc cncckees thassedecceeabevastasccncende — .03 .019 
CT cnn dhé04s0o008dn ane eesnsnkunencedeeeees .400 .275 
CS ERERTEP ETS CT OCT POC TET TET eT. .529 .507 
a ete tlh, As de Wek dawn ode wel .449 .540 
Non-phonetic substitutions 
a te a dee ce naee hie eh bee . 802 .795 
IN od ek, deka h iaen cena waned é semwiiel .473 .412 
ic ctevaseneh sen cbenes os nb kad beseesa nes . 360 . 305 
iss Ss bab Ke RER Oe Wee Mew eakeresesacu shes .400 374 
I Ln sa tla bakes bear ae en eae Mee ene cee —.01 245 
el ch sey heeds a eee bode eneds aieees 6 kee heel .571 .581 
ne bs Os ce nendpaeerene wen Ren Vekees aeeam .214 —.18 
eee pnce ag hed KP AN Ke aN 4 ACG ee eNe .843 . 742 
Pw accnie nek eh eeebesseenn ean conden ewe .773 . 750 
a. cu vk ce hoe so Reee Reade s bean neeress .400 











errors is +.948. Ther based on per cent of error is also +.948. This 
may be interpreted to mean that the classification as a whole, despite 
the unreliability of a number of its parts, is a satisfactory one insofar 
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as reliability is concerned. Using the coefficients of the present study 
and the values estimated by the Spearman-Brown formula, classifica- 
tions based on the widely used types of total omissions, additions, 
transpositions, phonetic and non-phonetic substitutions would have 
total reliabilities of +.83 for number of errors and +.85 for per cent 
of errors. 

It is apparent that the magnitude of the coefficients is not always in 
direct ratio to the objectivity of the type of error. For example, in 
such almost purely objective error types as omission of a sounded letter, 
addition by doubling, and homonyms, the reliability coefficients are 
below +.40. In the error types of omission of a syllable, phonetic and 
and non-phonetic additions, phonetic substitutions for an entire word, 
incomplete and unrecognizable, which are dependent in part upon the 
judgment of the examiner, the coefficients are above +.50. A partial 
explanation of this apparent inconsistency may be the inconsistency 
of the subjects to err in particular types. A possible, but less probable, 
explanation may be that the writer was less consistent in classifying 
objective types of errors than in judging other types. 


CONCLUSIONS 


Statistical analysis of the results of the application of a proposed 
system of classification of spelling errors has demonstrated: 

(1) That the classification is valid in that its distribution of errors 
is remarkably like that found by averaging the findings of a number of 
earlier writers. 

(2) Significant differences between the mean per cents of error in 
the present study and the means found by averaging the findings of 
others appear in omission of a single silent letter, phonetic substitution 
for a consonant and a syllable. The last two differences are confirmed 
in a comparison of the medians of the present data and the medians of 
the reports of others. 

(3) That these differences may be eradicated by redefinition of the 
errors. 

(4) That the tendency to err in certain ways is insufficiently reliable 
to warrant use of these types of diagnosis. 

(5) That a sample of fifty misspelled words is probably inadequate 
for reliable interpretation of the tendencies of individuals to make 
certain types of errors. 

(6) That the total reliability of the classification is satisfactory. 
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(7) That the magnitude of the reliability coefficient of an error type 


is not necessarily in direct ratio to the objectivity of the definition of 
the error. 


(8) That there is slightly greater reliability for the classification on 


the basis of per cent of error rather than number of error. 


(9) That the classification as a whole is superior in reliability to 


common classifications by means of certain gross types of errors. 


10. 


11. 


12. 


13. 
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AN OBJECTIFIED PRACTICAL TEST FOR CLINICAL 
PSYCHOLOGISTS 


GRACE MUNSON, MILTON A. SAFFIR AND HELEN U. CHAMNESS 
Bureau of Child Study, Board of Education, Chicago, Illinois 


The most frequent and most serious criticism of objective tests as 
an examination method has been the alleged lack of versatility. The 
work of many able psychologists and educators in this field in recent 
years has dispelled the notion that objective testing is limited to the 
true-false type of question with its limitations, for a multitude of 
diversified objective test forms and devices have been and still are 
being invented. Much work has been done, too, to demonstrate that 
the methodology of objective testing is not limited, as some critics 
insist, to the examination of subjects and skills learned through rote 
memorization. 

The authors of this paper have attempted to apply the methodology 
of objective tests to the problem of evaluating the skills and techniques 
of clinical psychologists as distinct from their knowledge and their 
record of training and experience. It is obvious to anyone who has 
worked in clinical psychology that there is a real and important differ- 
ence between knowing a given body of subject-matter and being able 
to use that knowledge in practical situations. Since practical skill 
depends on wide knowledge, it is essential that a clinical psychologist 
should have courses not only in the field of clinical psychology, but 
also in all the fields of pure psychology. It is lamentable that so 
many clinics, hospitals, courts, schools, and other institutions which 
employ clinical psychologists do not realize this, and consequently 
entrust important responsibilities to individuals who may be skillful 
in dealing with people, but who are inadequately prepared so far as 
academic training is concerned. Conversely, since knowledge without 
the ability to apply it in a practical way is useless to a clinical psy- 
chologist, it is essential that an individual’s academic training be 
supplemented by closely supervised experience and by the necessary 
personal aptitudes. It is regrettable that many colleges, universities, 
professors, and instructors who recommend candidates for positions in 
schools, clinics, etc. do not realize this, and consequently lend their 
prestige to individuals with adequate degrees, courses, and marks, but 
with inadequate personal aptitudes and practical skills. 

The examination to be described here was designed to measure the 


practical skill of candidates for positions as clinical psychologists in 
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the Bureau of Child Study of the Chicago Board of Education. It is 
supplementary to an evaluation of the candidate’s training, knowledge, 
experience, and personality—these being handled through eligibility 
requirements, written examinations, and oral interviews. To be 
eligible to take the examination a candidate must have received at 
least a Master’s degree, with not less than eighteen majors covering 
psychology and the educational field in which the psychology would 
be applied, must have had at least one year of teaching experience and 
one or more years of experience as a psychologist in a clinic of recog- 
nized status. The written examinations consist of a three-hour paper 
in the field of clinical psychology and related educational techniques, a 
two-hour paper in academic psychology and education, and a paper in 
English. The practical examination described in this paper is given 
after the written examinations are completed. The oral interview 
includes a conference with administrators of the school system, and a 
weighted evaluation of credentials. 

The feature of objective testing which is most fundamental— 
though often scarcely recognized—is not the elimination of subjective 
judgment but the shift, so far as possible, of this process from the 
marking of the answers to the formulation of the test questions. In 
making this shift for our practical examination we found the following 
criteria essential: 

(1) The test must evaluate the candidate’s practical abilities—his 
actual skills and techniques rather than his knowledge about skills 
and techniques. Since in clinical psychological work knowing is a 
necessary condition for doing, the test must involve the candidate’s 
knowledge, but it must evaluate that knowledge only as it is part of 
the doing. An example of this point might be a question regarding 
the finding of a child’s IQ. If we ask the candidate to tell us how to 
find an IQ we are investigating his knowledge, but if we give him raw 
data and watch him find the IQ as he would do in the clinic we are 
investigating his ability to apply this knowledge. That this is not 
merely a theoretical distinction was made clear to us when we found 
several candidates who knew how the IQ is found, but who, in the 
practical test, were too careless in arithmetic or too helpless when 
deprived of the tables or too lacking in judgment when the data were 
too low to be in the available tables, to handle adequately so simple 
yet fundamental a bit of knowledge. 

(2) The test must be extensive—should be a wide sampling of the 
practical skills needed by a clinical psychologist. The requirement of 
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adequate sampling applies to any test, of course, but is especially 
pertinent to a practical test, since in this field equally important items 
are especially prone to differ in the amount of time they require and 
in the ease with which they can be framed in objectified form. 

(3) The test should include an objectified technique for evaluating 
personal qualities necessary in the practical work of a clinical psychol- 
ogist. These, as distinguished from such personality qualities as 
neatness, and ease in conversation, which can be observed in the ordi- 
nary oral interview, are intimately tied up with practical clinical 
psychological work and cannot be adequately observed in isolation. 
Examples are the ability to meet unusual situations during a mental 
test, the quality of judgment used during testing, and intellectual 
honesty. 

(4) The test should not require too much time for administration. 
This is peculiarly important for a practical test because items tend to 
be time-consuming, and because the test must be administered to one 
candidate at a time. 

(5) The test should be equally difficult for all candidates. This is 
automatically taken care of in a group test, but must be given separate 
consideration when the test is administered individually, with the 
danger of the questions becoming known to the candidate before he is 
tested. One approach is to give different questions to each candidate. 
The problem of equivalence must then be considered. Another 
approach is to make the test consist of a vast number of items, so 
that the number of items that a candidate can learn about in advance 
is negligible. With this solution we avoid the difficulties of equating 
items, but we substitute the difficulties of inventing many short but 
valid items. 

(6) The test should be objective in its administration. The 
importance of this requirement is self-evident whenever a test is to 
be repeated, such as with examinations given individually. The 
elimination of complexity is the method used for making written 
questions objective, but practical situations are characteristically 
complex. 

(7) The test must be scorabie with ease and objectivity. This 
means that the items must be so set up that correct answers should 
be easily distinguishable from incorrect ones. It is preferable, of 
course, that the scoring involve the exercise of no judgment at all. 
It is possible to frame some practical questions in precisely this way, 
but for many parts of the examination we found it necessary to content 
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ourselves with objectified rather than objective scoring. By this 
we mean narrowing down the issues to be judged. If we require the 
candidate to perform a complicated task, our judgment of the ade- 
quacy of his performance is quite subjective. We can objectify our 
judging in one of two ways: by simplifying the task to be performed, or 
by judging the performance by a series of simple criteria rather than 
the one general criterion of adequacy. 

In the light of the above criteria, the authors devised a program of 
tasks and questions to constitute an objectified practical test for 
clinical psychologists. Itis not a test that can be applied in its entirety 
to candidates for positions as clinical psychologists in other institu- 
tions, but it may be suggestive to others who have the need for con- 
structing a practical test for clinical psychologists. We feel also that 
this material may be of value to prospective candidates for such posi- 
tions and to those who train such candidates, since the examination 
is an index to the skills, techniques, and abilities needed by clinical 
psychologists in agencies such as the Bureau of Child Study. 

The examination as administered required a half day for each 
candidate. It consisted of eleven parts, each of which was weighted 
between one and seven according to our estimate of its importance, 
so that there was a total weight of twenty-five. 

Part 1—Repertoire of Tests (Weight 2).—Each candidate was 
required to tabulate all the tests with which he was prepared to demon- 
strate his skill, under the following headings: Individual Intelligence 
(verbal), Group Intelligence, Performance, Special Aptitude, Achieve- 
ment, Diagnostic, Personality and Miscellaneous. Later the exam- 
iners asked the candidate some question about each test on his list, 
such question being designed to reveal whether he was actually 
familiar enough with the test to use it. For example, if the New 
Stanford Reading Test were listed, the examinee was asked to describe 
the procedure for scoring, or to make up a sample item in the same form 
as in the test. If these questions were all answered satisfactorily, 
the candidate was given credit for each test in his repertoire, according 
to a prearranged schedule of five points for each individual test battery, 
two for each test of special aptitudes, one for each achievement test, 
etc. No credit was given for a test if the answers to the examiner’s 
questions indicated lack of real familiarity with it. 

It is obvious that a psychologist’s test repertoire represents some 
of his practical ability. The richness of the psychologist’s stock of 
usable tests is a function of his actual experience in the field. In this 
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part of our examination we were interested only in how many and which 
tests the candidate was ready to use; the skill with which he could use 
each was evaluated in other parts of the examination. Part 1 of our 
test met all our requirements for objectivity. 

Part 2—Familiarity with Test Blanks, Statistical Forms, Etc. (Weight 
2).—The candidate was given a folder containing twenty-six pages 
torn from various tests, statistical forms, recording blanks, etc., such 
as the following: Gates Primary Reading Test, Kwalwasser-Dykema 
Test, Strong Vocational Interest blank, Cornell-Coxe record blank, 
Hollerith card, Thurstone correlation data sheet. He was asked to 
record for each one the type of test from which the page came, and 
the name of the test. If he was unfamiliar with a sample he was 
encouraged to guess the type of test to which it belonged, but not the 
name, since, it was explained to him, a psychologist with wide experi- 
ence develops judgment in recognizing the type of test to which an 
unknown sample belongs, whereas guessing the name wrong discloses 
his ignorance about two tests. In scoring, four points were allotted 
to each sample except two, which received three points. In general, 
one point was given for recognizing the general type of test (‘‘read- 
ing’’), a second point for a finer classification of the type (“‘aptitude’’), 
a third for the approximate name (“‘Monroe Readiness”) and the 
fourth for the exact name (‘‘ Marion Monroe Reading Aptitude Test’’). 

There seems to be little doubt that this part of the examination 
meets the criteria we have listed above. It involves practical matters, 
it is wholly objective, it is extensive, yet requires relatively little 
time. 

Part 3—Familiarity with Test Objects and Devices (Weight 2).— 
The candidate was shown fourteen test objects, such as the O’Connor 
Wiggly Blocks, a Kohs Block Design card, a Marion Monroe Word 
Discrimination card, etc. Instructions were to record the type of 
test (guessing if not familiar), the batteries of which it was a part, and 
the name of the test. This part is similar to Part 2 in scoring and in 
the justification for its inclusion in our examination. 

Part 4—Calculations in Clinical Work (Weight 1).—The justifica- 
tion for including this type of work sample, which is perfectly objective 
in administration and scoring, has already been discussed in the early 
part of this paper. The problems were designed to require the exam- 
inee to calculate the CA from birthdate and date of test (knowing 
what to do with an odd sixteen days), to know how many tests there 
were at various age levels in a given battery and how much credit to 
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give for each, to calculate an IQ after correcting the CA without the 
use of tables, to know how to use the tables, and to know what to do 
if for a given CA the MA was too low or too high to be included in the 
tables. 

Part 5—Administration of Revised Stanford-Binet, Form L (Weight 
7).—This was the most important single part of the practical test, 
since in our clinic, as in most others, this test is basic. There were 
three sections to this part of the examination, as follows: Prepared 
Questions (weight 1), Vocabulary Test (weight 1), and Administration 
of Test (weight 5). 

Prepared Questions—The examiners prepared a list of items with 
which the examinee might demonstrate, without the manual, his 
familiarity with the Revised Stanford-Binet, Form L, as, for example, 
reproducing the three bead-chain patterns, folding and cutting the 
paper for the highest-level paper-cutting test, reproducing the Mem- 
ory for Designs drawings, etc. A second list of items was prepared 
for the commonly-used performance tests. Each candidate performed 
three items from the first list and two from the second, the selection 
being made at random by the examiner, but a different set for each 
candidate. : 

Vocabulary Test—The candidate administered a selected random 
sampling of the Vocabulary Test to the examiner, who posed as the 
pupil. The latter responded with answers that had been prepared 
in advance. The examiner then evaluated the candidate’s procedure, 
and his scoring of the responses. Part of the scoring was as objective 
as the Vocabulary Test itself, but part involved the use of more judg- 
ment by the examiner. The latter part was objectified, however, by 
setting up a specified number of relatively simple items to be judged, 
e.g.: Did the candidate record the responses fully? Did he obtain 
good rapport? Did he use supplementary questioning correctly? 

Administration of Test.—The candidate administered and scored 
one complete year level in the middle range of the scale, and also five 
individual tests chosen from the other sections of the scale. The 
examiner posed as a pupil, but gave responses that had been prepared 
in advance and exhibited behavior formulated in advance. Scoring 
involved the same procedures as were mentioned for the Vocabulary 
Test. 

Part 6—Administration of Group Intelligence Test (Weight 1).—The 
procedure and scoring were along the same lines as that of the Binet 
scale. The Kuhlmann-Anderson test is the group test used in the 
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Chicago public school system. The Grade 1 Semester 1 form of the 
test was used here since it involves the most complicated group testing 
techniques. ; 

Part 7—Measurement and Diagnosis of Reading Ability (Weight 1). 
The administration of Gray’s Oral Reading Test and scoring according 
to the Monroe classification were required. Procedure and scoring 
were essentially the same as for the Binet scale. 

Part 8—Administration of Performance Tests (Weight 2).—The 
battery used depended on the ones which the candidate had listed in 
Part 1. For the most part either the Grace Arthur or the Cornell- 
Coxe batteries were used. Procedure and scoring were analogous 
to that used in the test on the Revised Stanford-Binet. 

Part 9—Summarizing Examination Data and Making Proper 
Recommendations (Weight 3).—The examiners prepared a case study 
report in the same form as the examination reports which are prepared 
by the Bureau of Child Study psychologists, except that the Summary 
and the Recommendations were omitted. The report included 
identifying data, problem, school history, mental test data, achieve- 
ment test scores, social history, and physical appraisal. The report 
was so formulated as to present a specified list of factors that are 
commonly considered of importance in problem children. The candi- 
date was to write the Summary and the Recommendations from the 
material he was given. He was aided by having a printed list of the 
factors usually found and recommendations for placement and treat- 
ment usually made by the staff of the Bureau of Child Study. His 
summary and recommendations were evaluated according to whether 
the factors written into the case were taken into account. It is true 
that a good deal of subjective judgment was used in grading this part 
of the examination, but the specific factors written into the case make 
a clear frame of reference for the judgment and evaluation. 

Part 10—Coding Examination Data for Research Purposes (Weight 
1).—The data obtained in Bureau of Child Study examinations, as 
well as the factors analyzed and the recommendations made, are 
punched upon a Hollerith card according toacode. The data are thus 
available for research purposes. The extent to which a candidate 
could adapt himself to routines, and his accuracy and ability in follow- 
ing detailed, complicated instructions were tested by having him code 
the prepared case and his summary and recommendations. Since 
the instructions on the code sheet are perfectly explicit, the coding 
could be scored with perfect objectivity. 
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Part 11—Personality Factors as Part of Skill and Technique (Weight 
3).—Although personality factors were already considered in evaluat- 
ing the candidate’s ability to administer the various tests, it was felt 
that these aspects of his practical skills ought to be graded separately 
as well. This was done by the collective judgment of the three 
examiners who sat as a committee immediately after they had observed 
the candidate in the exhibition of his various skills and techniques. 

The four main personality factors which were judged were: (1) 
Psychological insight and judgment (2) General mastery of the clinical 
situation (3) Poise and professional bearing, and (4) Honesty and 
sincerity. The evaluation of the degree to which the candidate 
possessed the above qualities was based on subjective judgments, to 
be sure, but an effort was made to objectify even this by formulating 
specific traits to be looked for, and by creating specific situations that 
would call them forth. 

One aspect of psychological insight and judgment, for example, was 
observed by noting whether the candidate knew when and how to 
depart from a robot-like adherence to the test directions. An example 
of a test of a candidate’s general mastery of the clinical situation was 
his reaction to rebelliousness or distractibility on the part of the 
examiner, posing as the pupil. One good test of the examinee’s poise 
and professional bearing was his reaction to certain of his own failures 
in this practical examination, to which the examiners made him react. 
Finally, as an example of a prearranged test of honesty and sincerity, 
we might cite the use we made of the candidate’s listing of tests in part 
I of this examination as compared with his actual familiarity with 
them. 


SUMMARY 


The authors, after setting up seven criteria for an objectified 
practical test for clinical psychologists, have described such a test as 
used in the Bureau of Child Study of the Chicago Public Schools. The 
test consists of eleven parts, and requires a half day to administer. 

The only validation which was used by the authors was their own 
professional experience and judgment. It would be of some interest 
to give our test to established clinical psychologists and to those who 
teach clinical psychology. The authors did not attempt this because 
of obvious considerations in the realm of ‘‘practical”’ psychology. 
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A COMPARISON OF SCORES ON A STUDY INVENTORY 
WITH SELF-ADJUDGED IMPROVEMENT ON A 
NUMBER OF STUDY FACTORS 


N. FRANKLIN STUMP 
Keuka College, Keuka Park, New York 


Causal observation has indicated to teachers that there is a wide 
discrepancy between the methods of study used by freshmen and the 
procedures adopted by sophomores in the preparation of assignments. 
The different methods of study are not always as much in evidence as 
the difference in achievement secured from the two groups. When no 
specific training in techniques of study is given to individuals during 
their college careers, the surprising thing is that they seem to acquire, 
on their own initiative, some procedures which are effective. Of 
course, improved-method-of-study is only one of many factors which 
may make a college student’s achievement more effective as deter- 
mined by his grades, the longer he remains in college. If the factors, 
however, could be discovered which make sophomores, rather gener- 
ally, better students than freshmen of equal general ability, then 
teachers would be able, partially, to bridge the gap between the poor 
and the good procedures in study by acquainting the individual with 
the factors he should avoid and introducing others he should adopt 
earlier in his school career. Hence, since college work is on a higher 
plane and involves improved methods, at least more independent, 
from those of high-school study, college students could avoid many of 
the blind alleys during the freshmen year and thereby make a more 
speedy adjustment by minimizing wasted energy, if specific factors 
which seem important for successful work were specifically revealed 
to them. 

In going from high school to college the causes of student diffi- 
culties seem to center about the problems of adjustment relative to 
techniques of study. Wood and Learned! listed ten difficulties which 
college students are generally required to solve. The two most 
difficult factors to which these authors attach most significance per- 
tain to the “ Difficulty in working out and observing a study schedule”’ 
and ‘‘Failure to adjust promptly to classroom methods which dif- 
fer from those previously used (lectures, etc.).’”’” More emphasis, 
undoubtedly, should be given to the analysis of factors which are 
essential in making study procedures effective. 
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A modified form? of the Pressey Study Inventory was adminis- 
tered to forty-six freshmen during the ninth week of their first semes- 
ter’s work and fifty sophomores during the ninth week of their third 
semester’s work at Keuka College. Scores on this Inventory are 
intended to discriminate between the good and the poor student. 
Ruch says, ‘‘The questions in this self-inventory are ones which have 
been shown to differentiate the effective from the ineffective among 
learners.’’? ; 

At the same time the Pressey Inventory was given, reactions on 
a rating scale for self-adjudged improvement were secured. The 
individuals were requested to report the amount of improvement which 
was made by them since entering college, on thirteen factors which the 
writer considered important for effective learning. A section of this 
rating scale for self-adjudged improvement, devised by the writer, 
is reproduced below. (For a complete set of factors see Table I or II.) 


In the scale below, zero represents your standing with regard to the trait 
in question at the beginning of your college career. If ‘10’ represents the 
perfect position so far as you are concerned, make an “X” at the point on 
the scale which indicates to what extent you have progressed toward this 
perfect position. If you have improved yourself 30 per cent, make an “X” 
at 3, if 50 per cent make an “ X” at 5, etc. Critically examine your own 
improvement befofe recording your mark. 


en a a ee 


8 9 10 
Ability to take notes on | | | | | | | | | 
lectures 











The purpose of this study was to indicate a few of the factors 
which seem to be important in discriminating between the inefficient 
freshmen and the effective sophomores relative to techniques of study 
by determining (1) the reliability of each item in the rating scale for 
self-adjudged improvement, (2) the relationship between scores on the 
Study Inventory and the degree of self-adjudged improvement which 
was made since matriculating in college, (3) the relationship between 
the scores on the Study Inventory and the point-ranks in scholarship 
for each freshman and each sophomore, (4) the relative effectiveness 
of the adoption of certain study techniques when considered in the 
light of the Study Inventory, and (5) the techniques which college 
sophomores adopt which show a significant difference of effective 
learning in their favor as contrasted with the apparent ineffective 
procedures of the freshmen. 
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RESULTS 


The reliability of each of the thirteen items of the study question- 
naire for self-adjudged improvement was determined by readminis- 
tering the scale to sixty-five sophomores, juniors, and seniors. The 
reliability coefficients with their standard errors are presented in 
Table I. 


TaBLeE I.—ReEwiaBiLity CoEFFICIENTS WITH THEIR STANDARD ERRORS FOR THE 
ITEMS IN THE RatinG ScaLe For SELF-ADJUDGED IMPROVEMENT IN Stupy TxEcsH- 
NIQUES—ARRANGED IN ORDER OF MAGNITUDE 





Reliability | Standard 
coefficients | errors 





Ability to take notes on lectures....................... .89 .026* 
Ability to distribute time to advantage................. .86 .032 
Ability to concentrate in class.....................00-- .82 .041 
Ability to increase vocabulary.....................20- .82 .041 
Improvement in reading rate................2:0 ce eeeee .79 .047 
i re a a ee a .78 .049 
Ability to take notes on collateral reading.............. 77 .050 
Ability to concentrate while studying.................. .75 .054 
SN os Kock Si wchpwsddee oc vsnddicerin .75 .054 
Improvement in reading comprehension................ 71 .062 
Ability to analyze author’s viewpoint................... 71 .062 


Ability to synthesize materials in course reading, lectures, 
on nw nn ned os cekeawebece haan .67 .068 
Ability to take notes on reading.....................4. .64 .074 











*From Dunlap and Kurtz: Handbook of Statistical Nomographs, Tables, and 
Formulas. New York: World Book Co., pp. 32-33. 


The items which show the greatest reliability are the following: 
(1) Ability to take notes on lectures, (2) Ability to distribute time to 
advantage, (3) Ability to concentrate in class, and (4) Ability to 
increase vocabulary. 

It is shown that students are very consistent with regard to how 
much improvement they believe they have made in the above four 
factors closely related to study. Since individuals have very definite 
opinions concerning the amount of improvement which they make on 
these items, guidance on how to study might profitably center around 
these techniques. At least they will offer the teacher several hints on 
the improvement of study procedures, about which the individual does 
not easily vary his opinions. 


226 The Journal of Educational Psychology 


Those items which indicate the least reliability are: (1) Ability to 
analyze the author’s viewpoint, (2) Ability to synthesize materials 
in course reading, lectures, text, and discussions, (3) Ability to take 
notes on reading. The uncertainties of self-adjudged improvement 
attached to the items may be due to some variation in the difficulty 
of different subjects. For example, an analysis of the author’s view- 
point is generally considered more difficult in psychology than in the 
social sciences. Likewise, the ability to take notes on reading might 
vary considerably from course to course because of the inherent nature 
of the subject-matter. Further experimentation should be made on 
the hypothesis that the ability to take notes on lectures is a more con- 
sistent one among individuals when measured at different times, than 
is the ability to take notes on reading because of the content diffi- 
culties of the former being decreased by the verbal explanation of the 
lecturer. 

Before any comparisons were made between the freshmen and the 
sophomores it was deemed necessary to show that these groups were 
from one and the same population, 7.e., that there was no significant 
difference between them in scholastic aptitude. Percentile scores on 
the American Council Psychological Examination® were used to equate 
these groups. For this purpose, perhaps, no other single measure is 
superior to this scholastic aptitude test because of the nature of the 
factors under consideration in this paper. 

The mean percentiles for the freshmen and sophomore groups on 
the psychological test were 57.82 and 62.08, respectively. The 
sophomores had a mean score of 4.26 percentile points above the 
freshmen. Is this a significant difference? The standard deviation 
indicated a greater scatter of scores for sophomores: freshmen o19.9; 
sophomores, 024.2. The standard errors of the means were calculated: 
freshmen 2.92; sophomores 3.42. The standard error of the differ- 
ence is 4.50, with the ratio of the obtained difference and the standard 
error of the difference resulting in a value too small to be significant; 
namely, .949. 

In Table II are recorded the correlations, calculated by the Pearson 
product-moment method, between scores on the Study Inventory 
and the ratings of self-adjudged improvement on factors important 
in study. In all of the factors excepting four, the sophomores have 
higher correlations than the freshmen. This closer relationship for 
sophomores between scores on the Inventory, which is intended to dis- 
criminate the “efficient” from the ‘‘inefficient”’ individual in tech- 
niques of study, and ratings of self-adjudged improvement, is not 
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unusual. It would be natural to suppose that sophomores because of 
their added maturity and increased adjustment to college life would 
show themselves more efficient as measured by the Study Inventory, 
and at the same time adjudge their improvement on the various 
factors more nearly commensurate with their measured effiiciency in 
study. 

Higher r’s for the sophomores may be partly due to the more 
accurate self-appraisal of improvement, which it would be expected 


TaBie II.—Corrrici=nTs OF CORRELATION BETWEEN Scores ON A Stupy INvEN- 

TORY AND RaTINGs ON SELF-ADJUDGED IMPROVEMENT ON THIRTEEN Srupy 

Factors FOR FRESHMEN AND SopHomoRES; STANDARD ERRORS OF THE 17’8; AND 
EXPERIMENTAL COEFFICIENTS 











Sophomores} Freshmen Experi- 

Study inventory scores mental 

with ability to | — . D | cuit. | ooeffi- 
r’s | of | 7's | or pr 
Improve reading comprehension...}| .39 |.120| .07/.147/.32 | .190 .60 
Take notes on collateral reading...| .35 |.124) .10).145).25 | .191 47 
Distribute time to advantage......| .33 |.126|)—.01/.147|.34 | .194 .63 
Earn grade-point scholarship... .. . .31 |.128) — .04).147).35 | .195 .65 
PE chisgecéseedeeesnes .31 |.128| .10).145).21 | .193 .39 
Concentrate while studying........ .28 |.130) .07|.147/.21 | .196 .39 
TE pee rrr .28 |.130|) — .07|.147|.35 | .196 .65 
Improve reading rate............. .26 |.132} .03).147/.23 | .198 42 
Concentrate in class.............. .20 |.136) — .39].125).59 | .185 | 1.16 
Analyze author’s point of view.....| .18 |.137|—.30).134).48 | .192 91 
Take notes on lectures............ .02 |.141) .12).145).10 | .202 .19 
Increase vocabulary.............. .O1 |.141| .14).145).13 | .202 .23 
Synthesize materials of the course. .| ~ .002|.141| .20).141).202) .199 .37 
Take notes on reading from text...| .18 |.137| .28).136).10 | .193 .19 


























the more mature sophomores might give. If this assumption may be 
accepted, it would mean that sophomores can more accurately recog- 
nize the factors which are important in study, while the freshmen are 
hampered by the inability to perceive their own problems bearing 
upon efficiency. 

Concentration in the classroom situation is a factor which should 
receive considerable emphasis in orientation and in how-to-study 
discussions. Table II indicates that the experimental coefficient is 
1.16 in favor of a closer relationship between scores on the Study 
Inventory and ratings of power of concentration among sophomores 
than among freshmen. Many college freshmen, it appears, have not 
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learned to concentrate intently in high school. The positive relation- 
ship between the sophomore scores on the Study Inventory and ratings 
for self-adjudged improvement relative to concentration, and, in con- 
trast to it, the marked negative relationship for freshmen, determine 
the seriousness of the problem for the latter group, the r’s being .20 
and —.39, respectively. Valuable results might accrue from specific 
instruction with the purpose of developing concentration in the class- 
room among entering college students. 

Besides the concentration-of-attention factor, four other factors 
with experimental coefficients in descending order should be mentioned: 
(1) Ability to analyze the author’s point of view, (2) Ability to use the 
library, (3) Ability to distribute time to advantage, (4) Ability to 
improve reading comprehension. 


SUMMARY 


(1) Fairly high reliability coefficients were obtained by readminis- 
tration of the rating scale of self-adjudged improvement, the four 
most reliable factors being: Ability to take notes on lectures, .89; 
Ability to distribute time to advantage, .86; Ability to concentrate 
in class, .82; and Ability to increase vocabulary, .82. 

(2) When there is no significant difference in scholastic aptitude 
between a freshman and sophomore group, the experimental coeffi- 
cient indicates that the sophomores have more thoroughly grasp the 
importance of the ability to concentrate in class. 

(3) When a Study Inventory is used as criterion for the discovery 
of ‘“‘efficient” and “ineffective” study habits, sophomore college 
students almost consistently show higher r’s between scores on such 
an Inventory and ratings on self-adjudged improvement relative to a 
number of study factors. 

(4) A much closer relationship between point-rank scholarship and 
scores on the Study Inventory for sophomores as contrasted with 
freshmen leads one to believe that sophomores adopt a larger number of 
satisfactory study habits which account for this increased relationship. 
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NOTE ON RETEST RESULTS ON THE ACE 
PSYCHOLOGICAL EXAMINATION FOR COLLEGE 
FRESHMEN 


WILLIAM A. THOMSON 


Carleton College 


The use of scholastic aptitude tests for the selection and counselling 
of college students has lead to the development of various methods of 
obtaining test scores for incoming freshmen. The Association of 
Minnesota Colleges, in coédperation with the Testing Bureau of the 
University of Minnesota, has developed a program whereby all high- 
school seniors in the State are tested in December or January of each 
academic year with the American Council on Education Psychological 
Examination and the Codéperative English test. Test results are made 
available to the various colleges in the State as soon as the scoring and 
computation of percentile rankings are completed. 

The present study provides a comparison of the test results obtained 
for one hundred six college freshmen who took one form of the ACE 
psychological examination as high-school seniors in Minnesota and 
another form of the same test upon entering Carleton College. As 
high-school seniors they took the 1935 form in January, 1937, in 
their respective high schools and under local administration, which 
presumably complied with the instructions sent out from the Uni- 
versity of Minnesota Testing Bureau. As a part of the freshman 
class entering Carleton College in September, 1937, they took the 
1937 form of the same test. 

Table I presents the correlations between the raw scores on various 
parts of the two forms of the test. With the exception of the analogies 
test, the correlations between various subtests show little variation. 
It would seem that the lower correlation for the analogies test and the 
higher correlation for the gross score should be interpreted as due to 
the inherent nature of the test itself. Taylor’s' results seem to be in 
line with this interpretation. Studying test-retest results on the ACE 
test, he concluded that ‘“‘factors responsible for gains are specific 
for each individual test.’’ His phrase ‘“‘each individual test”’ refers to 
parts of the ACE test. 





1 Taylor, H. R.: “The Effect of Time Interval on the Reliability of ACE 


Psychological Examination Scores.” Psychological Bulletin, Vol. xxxu, 1935, pp. 
545-546. 
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TaBLE I.—CoRRELATIONS BETWEEN Raw Scorgs ON Two Forms or tos ACE 
Test TAKEN E1cut Montus APART 


in go cabwebe ed ouae ecb bbudcGaws doectneesk .696 
SETA A ESET FEO O TOE Pe OT Ee PEE .784 
Sule denn cde edhe dash he ¥bus sa qhew ede bedsdnbeénnt .530 
unk og ih cei das denne ib Akae eR Rese bibn 707 
cscs cccteesebeeabentaaes oes awaesesuuanenede .730 
Cs 6 os twee adh Ss bath aghs keke pecans ees .869 


Some persons may be inclined to think of the correlations in 
Table I as reliability coefficients. It is doubtful whether they can be 
so considered in a technical sense.! The coefficients here reported 
reflect not only the similarities and differences in the test forms per se 
but also the reliability of the student, changes in educational back- 
grounds during an eight-month period, and variations in attitudinal, 
emotional, and physical factors. Jordan? has demonstrated empiri- 
cally that reliability coefficients between forms of the same test are 
lower than those determined by the split-half method. Both he and 
Dunlap,’ as well as others, have pointed out that test reliabilities as 
determined by different forms measure not only the reliability of the 
test but also the reliability of the testee. 

Taylor does not report correlations between part scores on the test, 
but it is interesting to note that Cowdery‘ reports a correlation of .889 
between retests with the same form of the Thorndike examination for 
intervals of less than a year. His results agree closely with the coeffi- 
cient of .869 found for the two forms used in this study. 

The results presented thus far have been for the raw scores on the 
two forms of the test. In order to get more direct comparisons, gross 
scores on the 1937 form of the test were converted into equivalent 
scores for the 1935 form by the use of Thurstone’s conversion tables. 

Table II presents the correlations between the two forms of the 
test, and between the two forms of the test and grade-point averages as 





1 Thurstone reports reliability coefficients ranging from .79 to .98 for various 
parts of the 1929 edition using the odd-even technique and the Spearman-Brown 
formula. The lowest reliability coefficient reported is for the analogies test. See 
Thurstone, L. L., and Thurstone, T. G.: ‘‘The 1929 Psychological Examination.” 
The Educational Record, Vol. x1, April, 1930, pp. 101-128. 

2 Jordan, R. C.: “‘An Empirical Study of the Reliability Coefficient.” Journal 
of Educational Psychology, Vol. xxv1, September, 1935, pp. 416-426. 

? Dunlap, J. W.: ‘Comparable Tests and Reliability.” Journal of Educational 
Psychology, Vol. xxtv, September, 1933, pp. 442-453. 

4 Cowdery, K. M.: ‘Repeated Thorndike Intelligence Examinations.”’ School 
and Society, Vol. xxv11, March 24, 1928, pp. 367-369. 
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computed at the end of the first semester of college work. Neither the 
difference between the correlations of converted and unconverted 
scores on the 1937 form with the 1935 form, nor the differences between 
the correlations of various test scores with grade-point averages are 
significant. As far as the total group is concerned it is apparently of no 
consequence whether the tests were administered in high school or 
college when the relationship with college grades is desired. 


TasLE II].—CoRRELATIONS BETWEEN THE Two Forms or THE ACE Trst anp 
BETWEEN THE TEST AND First SemMEsTER GRADE-POINT AVERAGES 





1935 Grade-point 
raw score | averages 





1937 scores converted into 1935 scores................ . 848 .594 
rs is eee eee ees Bek bee Lee . 869 .583 
EES, a SS SE ey ee ee Ea wake .568 











The above results should not be interpreted to mean that individual 
changes in test scores did not occur from one testing to another. The 
coefficient of .869 between the raw scores for the two forms is evidence 
that changes in rankings did take place. The average score on the 
1937 form after scores had been converted into 1935 scores was fourteen 
and five-tenths points higher than the average raw score on the 1935 
form. There are ninety-six chances out of one hundred that this 
difference is statistically significant. 

The difference in means between the two forms of the test is no 
doubt due in part to practice effect. However, a positive practice 
effect cannot be the only explanation, since individual changes in gross 
scores ranged from a loss of forty-three points to a gain of seventy-six 
points. Table III presents the distribution of plus and minus changes 
occurring between the two tests. 

Approximately forty-nine per cent of the scores changed twenty or 
more points; while approximately five per cent changed forty or more 
points. Approximately thirty per cent of the scores were lower on 
the second testing, while approximately seventy per cent were higher. 
It would appear that practice effect and attitudinal, emotional, and 
physical changes in the students must be thought of as accounting 
for such extremes in score changes. Coaching on the test is a possible 
explanation of extreme improvement in test performance, but whether 
this may have occurred in this case is not known. 
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The total range of score change was approximately one hundred 
twenty points. In terms of percentiles for freshmen entering four- 
year colleges! a difference of one hundred twenty points between 
two scores would cover the entire middle range of scores from the 
fifteenth to the eighty-seventh percentile. A change of twenty or 
more points, which occurred in about one-half of the cases in this 
study, would result in a change of approximately twelve percentiles 
in the middle range of the distribution and an insignificant change 
at the extremes of the distribution. A difference of forty points would 


TaBLe II].—CuHanceEs In EQuIvALENT Gross Scores For Two Forms OF THE 
ACE Trst WHEN ADMINISTERED E1igHt Montus APART 








Gains in F Losses in 
requency Frequency 
gross score gross score 

70-79 2 1-10 18 
60-69 1 11-20 7 
50-59 1 21-30 4 
40-49 12 31-40 2 
30-39 14 41-50 1 
20-29 15 
10-19 19 

0-9 10 














result in a change of approximately twenty-four percentiles in the 
middle of the distribution, and an insignificant change at the extremes. 

Some interest may be attached to the relation between test score 
gains or losses and original raw scores. The correlation between gains 
and original raw scores on the 1935 form was —.070, and the correla- 
tion between losses and original raw scores was —.059. Neither of 
these coefficients is sufficiently different from chance expectation to be 
significant. The correlation between size of test score change, either 
plus or minus, and original test scores was —.102, showing a very 
slight tendency for the largest differences in test scores to occur among 
the low scoring students. 

It has been said that an intelligence test score does not necessarily 
measure the maximum intelligence of the individual, that all we know 
after the test is given is that the individual has at least that amount 
of intelligence. Assuming adequate administration, it might be 





1 Thurstone, L. L. and Thurstone, T. G.: ‘‘The 1935 Psychological Examina- 
tion.”” The Educational Record, Vol. xvi, April, 1936, pp. 296-317. 
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reasoned that the higher of two intelligence test scores is the more 
accurate. If this is so, would the selection of the higher of two scores 
give a better correlation with academic achievement? In order to 
find an answer to this question the higher of the two scores, after 
converting the scores for the 1937 form into equivalent 1935 scores, for 
each individual was correlated with the grade-point average. The 
resulting coefficient was .556. This coefficient is not as large as, or 
significantly different from, those found when raw scores on either form 
or converted scores were used. 

The above results would seem to indicate that when dealing with 
large groups for purposes of predicting college success it makes little 
difference whether the ACE test was given during the last year of 
high school or upon entrance to college. When evaluating individual 
test performance it is to be remembered that the chances are approxi- 
mately even that the gross score will vary by as much as twenty points 
and that the chances are approximately one in three that this variation 
will be in the direction of a lower score. The amount of variation 
and its direction bears no appreciable relation to the size of the score. 











BOOK REVIEWS 


CuarLes H. Jupp. Educational Psychology. Boston: Houghton- 
Mifflin Co., 1939, pp. 566. 


This book is the first in a proposed series of basic professional 
texts in Education to be written by various members of the Depart- 
ment of Education at the University of Chicago. The twenty-nine 
chapters are grouped under four major headings as follows: Part One, 
Physical Heredity and Behavior; Part Two, Social Heredity; Part 
Three, Personality; and Part Four, Psychological Solutions of Educa- 
tional Problems. 

Part One is decidedly biological, including chapters on adaptation, 
the nervous system, inheritance, the emotions, and perception. Part 
Two, Social Heredity, represents a new name for old content—the 
psychology of the school subjects. Part Three includes a wide range 
of topics—intelligence measurement, maturation and personality 
development, mental abnormalities, and generalization. Part Four 
catches what is left—mental discipline, individual differences, super- 
vised study, lesson planning, etc. As the reader will infer there is 
nothing novel or refreshing about the topics considered. 

Educational Psychology is not in any sense a revised edition of 
Psychology of Secondary Education (Ginn and Co., 1927). The 
reviewer went to some pains to compare the chapters dealing with 
language and mathematics in the two books. The former topic 
is discussed much more completely in the new book—six chapters 
(one hundred thirty pages) as compared with three chapters (eighty 
pages) in the 1927 text. The space given to mathematics is approxi- 
mately the same in both volumes but the genesis of mathematical 
ability is elaborated upon at much greater length in Educational 
Psychology. The more recent text competes for the entire market 
inasmuch as it has been written for prospective teachers in both 
elementary and secondary schools. 

Judd’s definition of educational psychology is rather staggering 
(p. 3): 

Educational psychology may be defined as the science which describes and 


explains the changes that take place in individuals as they pass through 
various stages of development from birth to maturity. 


Most educational psychologists will agree that this is an unneces- 
sarily broad definition. Many changes in individuals have never been 
234 
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considered within the field of educational psychology. The adjective 
‘“‘educational”’ itself constitutes a delimiting concept. Such a defini- 
tion would make educational psychologists out of pediatricians and 
optometrists. 

Judd himself, in the subsequent development of his text, does not 
attempt to describe and explain all changes. He concentrates on 
behavior and evinces most interest in changes resulting from learning. 
The definition is more than carelessness, however. One weakness of 
the text is its inclusiveness. No one can even try to describe and 
explain all that Judd tries to describe and explain without getting very 
thin in spots. One paragraph (p. 15) on the “‘ Fundamental properties 
of protoplasm” probably contributes very little of genuine value to a 
prospective school teacher’s understanding. The same comment 
might be made regarding a page and a half dealing with the “Func- 
tions of the nervous system” or one paragraph (p. 113) on the ‘‘ Educa- 
tion of the Emotions.”” The reviewer would be interested to see some 
undergraduate read these materials and then have Judd attempt 
to determine the effect upon the student’s ‘‘Higher Mental 
Processes.” 

The reviewer found himself asking this question frequently during 
his reading of the first part of the text (Physical Heredity and Behav- 
ior): “So What? What are the big ideas teachers in training should 
get from all this which will help them in their instruction of the 
young?”” The chapter on Emotions illustrates the point. Here 
are most of the paragraph headings: Consciousness and Neural 
Processes; Theory of the Emotions Formulated by James; Conscious- 
ness and Motor Processes; Darwin’s Discussion of the Emotions; 
The James-Lange Theory of Emotions; The Emotion of Fear; Of 
Disappointment; Motor Tendencies; Aesthetic Appreciation; Attitudes 
(Natural and Acquired) and the single paragraph referred to above 
on the “‘ Education of the Emotions.” This made interesting reading, 
but the most significant concept to the teacher—‘The Education of 
the Emotions””—was barely recognized and named. By this comment 
the reviewer does not intend to imply that subject-matter such as is 
included under the above headings is not important. All values are 
relative, however, and theoretical discussions of the emotions are 
relatively of little consequence to teachers in training. 

As those familiar with his other writings might suspect, Judd is 
at his best writing about the higher mental processes. Early in the 
text he states his position with admirable restraint (p. 60): ‘“‘There is 
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some tendency in current educational theory and practice to neglect 
the distinction between lower and higher forms of behavior.” Judd 
is convinced, and the reviewer agrees heartily, that attention to the 
latter should occupy the major attention of school teachers. ‘‘ Because 
the child ought to digest his food properly, it has been assumed that 
the school has equal responsibility for training digestion and for pro- 
moting arithmetical thinking” (p. 61). This position, in the author’s 
judgment, is untenable. This explains his emphasis upon generaliza- 
tion, the extension, systematization, and organization of experience, 
analysis and abstraction. There are five complete chapters dealing 
with these concepts and they are discussed incidentally throughout 
the text. A great deal is made of the fact that all learning is not 
“‘progressive’”’ in the sense that it does not contribute to an expanding 
organization of experience. ‘‘It is entirely possible for the mind to 
acquire an item of knowledge and store it is memory in a way which 
makes it unproductive for the future” (p. 498). This represents 
useless learning. 

The author devotes no space to a description of systematic points 
of view. There are no quotations of or references to Gestalt, behav- 
ioristic, or psychoanalytic psychology as such. The names of Kohler 
or Freud do not appear in the index and the only reference to Koffka 
or Watson is the inclusion of one book by each among the supple- 
mentary readings. 

The reviewer is of the opinion that an understanding of the evalua- 
tion of behavior and achievement should result from a systematic 
study of educational psychology. While Judd’s text refers continually 
to investigations, the data in which have resulted from evaluation, he 
nowhere deals with the topic adequately. In Chapter 28, ‘‘ Methods 
of Psychological and Educational Investigation,” a few pages are 
devoted to the importance of measurements, and two pages to the 
nature, merits and demerits of the new type examination. The 
reviewer is at a loss to know where the teachers-to-be will learn about 
measurement in its broadest aspects if their training institutions adopt 
in toto the series of basic texts that Judd and his former colleagues 
propose to write. If serious treatment of the topic does not find a 
place in an Educational Psychology it hardly could be expected in the 
other volumes; namely, The American Educational System; School 
Administration for the Teacher; or Schools and the Social Order. This 
same neglect of measurement was apparent in Judd’s Psychology 
of Secondary Education (1927). 
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Mechanically, Judd’s book leaves little to be desired. The Table 
of Contents is unusually complete as is the Index. The figures and 
illustrations are plentiful (numbering seventy-three), clear, and many 
of them are apparently original. The type is easy to read, the topical 
headings well set off from the body of the text, and the amount of 
quoted material in small type has been kept at a minimum. The 
supplementary readings should be available in any college library 
regardless of when the depression struck. More than two-thirds of 
the books suggested for further reading were copyrighted prior to 
1930, and only about one in ten had been published after 1934. Inas- 
much as the great majority of these references are texts the implica- 
tion regarding the recency of Judd’s facts is clear. It is quite possible 
that the inferences would be the same were they based upon current 
research but the inclusion of more of the latter would have made the 
copyright date of the present text somewhat more meaningful. 

STePHEN M. Corey. 
University of Wisconsin. 


Peter SANDIFORD. Foundations of Educational Psychology: Nature’s 
Gifts to Man. New York: Longmans, Green and Co., 1939, 
pp. 464. 


This text, the first of a series of volumes designed to survey the 
foundations of educational psychology, is planned primarily for 
advanced students. After an introductory section on method and 
historical and systematic orientation, the author has presented major 
chapters dealing with heredity and environment, individual differences, 
physiological foundations of behavior, unlearned behavior, intelligence, 
and personality. 

There is an historical approach to most discussions throughout 
the book. In a few instances, such as Mendelism, the material is very 
detailed. On the whole, however, one is impressed by the condensa- 
tion which in places simulates an annotated outline. The latter was 
necessary, of course, to cover the wide range of data available. It is 
noteworthy that this condensation has been done in a masterly fashion. 
The author’s statement that any chapter could be used as the basis 
for a short course illustrates the comprehensiveness of each section. 

The fundamental nature of the material is obvious. Extensive 
citations from genetics, physiology, general and abnormal psychology 
are presented. Perhaps this is carried beyond reasonable limits at 
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times, especially with respect to physiology. Also certain historical 
and systematic materials seem superfluous. For instance, why should 
theories on mind-body relationships be presented? Although the 
author’s objective viewpoint is in line with recent trends in American 
psychology, the statement that introspective psychology yields results 
of doubtful scientific validity is easily refuted. Another instance 
of an untenable statement is “the difficulty of establishing peace on 
earth may be attributed to the pugnacious instincts of the human 
male.” 

Controversial issues are ably handled. Data and opposing 
views are presented and critically evaluated. Gaps in present knowl- 
edge are indicated. In no instance does the author dogmatically 
take sides. This is well illustrated in the heredity versus environment 
discussion where it is emphasized that these are correlative factors. 

The organization of materials in the book is excellent. It is 
clearly demonstrated that the author is master of the fields considered. 
While all chapters are good, the section on the nature and measurement 
of intelligence is probably the best and the one on personality the 
weakest. In the latter, most of the space is devoted to theory and 
techniques of measurement to the sacrifice of experimental data. 
Although the citation of books and monographs in the bibliographies 
is adequate, there is unfortunately an almost complete absence of 
citations from scientific journals. 

The instructors who are prospective users of this book must keep 
two things in mind. It is not an elementary text, and there must 
necessarily be much supplementary material furnished to fill in the 
outlines laid down by the author. Nevertheless, the book will prob- 
ably be considered a landmark in the field. Educational psychologists 
will be anxious to see subsequent volumes in the series. 

Miues A. TINKER. 
University of Minnesota. 


Epna W. Battey, Anita D. LaTon, AND EvizaBetTs L. BIsHop. 
Studying Children in School. New York: McGraw-Hill Book 
Company, 1939, pp. 182. 


The authors of the present volume and the accompanying work- 
books hold it as desirable, even necessary, that teachers should 
have some understanding of the children they teach. To secure 
such understanding requires that the teacher or student do something 
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more than read books and take courses. He must see children as 
living, growing beings and not as objects to whom arithmetic or 
spelling must be taught. On the other hand it cannot be expected 
that the teacher should have extensive technical training in child 
psychology. 

The purpose of this book is to present an outline of child develop- 
ment and behavior which the teacher may follow in observing a 
child or children in his own room. The text material describes 
bases for appraisement of children, characteristics of age levels, and 
illustrative material from the study of children at the pre-school, 
elementary-school and high-school levels. There is also an excellent 
annotated bibliography of books on all phases of child life. 

Accompanying the text are workbooks, one for each of the levels 
discussed, which contain blank spaces for observation on one child. 
The workbooks follow the same design as the text, and are arranged 
so that the record of observation in one area may be removed. The 
arrangement of materials is well executed, the topics are pertinent, 
and the text discussion well organized. This work should be of high 
value for courses in child psychology for teacher or parents, and it 
might well be used for independent study by anyone working with a 
child or several children. C. M. Lovurttirt. 

Indiana University. 


CHARLES Watrers OprEuu. The Secondary School. Champaign, 
Illinois; The Garrard Press, 1939, pp. 606. 


This text was written expressly for inexperienced undergraduates 
who have had little if any previous professional training. Part I is 
devoted to a quick (115 pp.) overview of comparative education from 
both the historical and modern points of view. The chapters on 
secondary education in European and other foreign countries are 
based upon sources which have appeared in the English language 
and are brought up to about 1938. Because of the scope of the text 
certain of the topics are treated superficially. Ancient and medieval 
secondary education, for example, are described in less than seven 
pages. Mexican secondary education gets two paragraphs and the 
whole of South America three. Most of the generalizations in these 
brief summaries seemed to the reviewer to be defensible. Chapters 
8, 9 and 10 trace sketchily the historical development of secondary 
education in the United States. 
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The second part of the book amounts to a consideration of our 
secondary-school population. There are chapters on the charac- 
teristics of adolescents and individual differences among adolescents. 
The treatment here, as elsewhere, is inferential. No data to speak 
of are presented. Part three deals with the curriculum, including 
separate chapters on most of the conventional high-school subjects. 
Parts four, five, and six describe extra-curricular activities, articula- 
tion, and the high-school staff, respectively. 

Odell’s book is so comprehensive that it is probably too much to 
expect a considerable amount of factual material. Undoubtedly 
undergraduates will know more about secondary education after 
reading such a text but it will be information acquired at a considerable 
price. Nothing but the bibliographies relieve the monotony of six 
hundred pages of reasonable but dull assertions. | 

In his preface Odell points out that he did not intend to include 
‘fany considerable amount of new material or large number of original 
ideas.””’ The reviewer rather wishes that he had. The book would 
have been more stimulating. STEPHEN M. CoreEY. 

University of Wisconsin. 


CHINESE PsYCHOLOGISTS IN NEED OF BooKs AND JOURNALS 


Many Chinese psychologists, having lost all their books, journals, 
reprints, etc. through bombing, are desperately in need of any psycho- 
logical printed matter which American psychologists care to spare. 
Professor Gardner Murphy, of Columbia University, has undertaken 
to make shipments of such books and journals as may be sent to him 
to western China, to which several of the east China universities have 
moved and are still carrying on their work under extreme difficulties. 
The first shipment will be made early in the Summer, and it is hoped 
that another shipment may be made in Fall. Those who may care to 
forward material to Professor Murphy should take advantage of the 
present low postage rate on books—if the package is marked ‘‘ Books” 
and contains no writing, the rate is one and one-half cents per pound. 
Shipments should be addressed to Gardner Murphy, Department of 
Psychology, Columbia University, New York City. 





